Seems implementation is straightforward (very similar to everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is on data curation (which details are lightly shared).
I haven't been following image generation models closely, at a high level is this new Flux model still diffusion based, or have they moved to block autoregressive (possibly with diffusion for upscaling) similar to 4o?
Diffusion based. There is no point to move to auto-regressive if you are not also training a multimodality LLM, which these companies are not doing that.