I'm still learning how best to work with Z-Image so I can't claim to fully know what I'm doing. v1.0 of this workflow used SDXL to refine the whole image which turned out to be less than ideal. With v1.1, I'm taking a different approach that I'm not entirely sure is necessary, although it does seem to give me the best result.
Essentially, I'm taking a similar high/low noise approach to the way WAN 2.2 video generation works. 3 steps on "high", return with noise, and then 7 steps on "low". In my A/B testing this seemed to give a better result than just running 10 steps in one sampler. I'm also generating at a very high resolution 1536x1920 and not doing any upscaling.
Theoretically, the "high" noise sampler could have different sampler/scheduler/CFG settings and possibly increase image composition and quality further.
Or this strategy could all be a complete waste of time. I'm a little in over my head with this workflow so I would really appreciate any and all constructive feedback or advice.