Z image base controlnet duo sampling

詳細

ファイルをダウンロード (1)

モデル説明

Introduction

Everything written below is based purely on my personal experience and observations. I can’t guarantee everything is correct, so take it as reference material only. Discussion and corrections are always welcome.

As an open-source model focused on photorealistic image generation, Z-Image-Base (ZIB) 官方页面 is easily one of the current SOTA models. Its image quality is extremely strong, and its prompt adherence and fine-detail control are honestly on another level. For example, you can modify the style of a single button purely through prompts without noticeably affecting the rest of the image.

However, ZIB is much weaker when handling spatial relationships between multiple subjects, especially multiple characters. Basic two-person poses are usually manageable, but uncommon poses often fail badly, with frequent misalignment and anatomical issues.
For threesome or more complex interaction scenes, no matter how detailed the prompt is, while it can occasionally generate good results, the overall experience feels more like rolling a gacha. Most outputs end up with broken anatomy or spatial errors.

Because of this, I started experimenting with ControlNet + mlutiple processors to guide composition more reliably.

I used the following combination:

After testing ZIB + 8 Steps LoRA + ControlNet together, I found that ZIB’s ControlNet can solve certain structural problems, but the resulting image quality still often feels lacking:

  • overall sharpness is weaker

  • lighting tends to look flat and gray

  • prompts related to lighting respond poorly

  • ControlNet strength at 1.0 often produces terrible-looking results

Under the 8 Steps LoRA setup, adjusting CFG (usually between 1 and 2) can sometimes help, but the workflow still feels heavily constrained by the base image. Some reference images even produce extremely strange lighting behavior.

Overall, the experience feels very different from SDXL workflows, where high-quality results often work almost out-of-the-box with minimal tweaking. I’ve also seen similar complaints on Reddit about ZIB’s ControlNet implementation feeling relatively weak.

Another important issue is that in ZIB, generation resolution directly affects focal length and depth-of-field behavior. Different resolutions can produce dramatically different compositions and camera feel. Solving this became one of the key goals of my workflow.


Workflow

After a lot of experimentation, I ended up building a relatively simple ControlNet workflow that gave me much more satisfying results.

The core idea is straightforward:

  1. Use any checkpoint you like together with the 8 Steps LoRA
    (This is extremely important. For ZIB, the 8 Steps LoRA is almost mandatory in my opinion. It significantly improves image quality and detail rendering.
    Of course, if your checkpoint already has something similar baked in, you don’t need to add it separately.)

  2. Use ControlNet together with a suitable resolution — usually a relatively low resolution — and perform a short initial sampling pass (around 2 steps)
    This stage establishes a solid base latent with correct character positioning and spatial relationships.

  3. Upscale the latent to the target resolution, then remove ControlNet and continue sampling using only the checkpoint + 8 Steps LoRA
    This preserves the structural consistency from ControlNet while dramatically improving image quality, lighting, and detail in the second pass.
    More importantly, you can freely adjust the second-pass resolution without heavily affecting focal length or depth-of-field behavior.

  4. The “small-resolution first pass + large-resolution second pass” approach also works well outside of ControlNet workflows
    It helps reduce the coupling between focal length behavior and image sharpness/resolution.


Recommended Models

Personally, I use a 1:1 merge of:

  • Big Love

  • Pornmaster

I feel Big Love performs very well in anatomy and clothing structure, while Pornmaster produces character aesthetics that fit my taste better.
The merged result feels surprisingly balanced in actual use.


Recommended Sampler

Under ZIB + 8 Steps workflows, I strongly recommend samplers that inject noise at every step.
These samplers consistently produce better anatomy and micro-detail quality than more deterministic alternatives.

My personal recommendation is:

  • Euler A


Key Parameters

In this workflow, there are basically only three major parameters you need to tune.


1. ControlNet Resolution (First-Pass Resolution)

The first sampling pass establishes the base composition latent, so this resolution matters a lot.

I usually default to:

  • short edge = 768

This feels like a very balanced starting point.

In ZIB, lower resolutions effectively produce a “longer focal length / shallower depth-of-field” look:

  • subjects become larger

  • background elements become fewer and more compressed

  • the model focuses more attention on the main subjects

  • prompt responsiveness for character details improves noticeably

This parameter can be adjusted depending on your goal:

Situations where lowering or increasing this resolution helps

  1. You already know the kind of depth-of-field or focal feel you want
    Adjust this value to match the desired camera look.

  2. Your subject differs heavily from the reference image
    For example:

    • different body type

    • different pose

    • weak prompt responsiveness

    Lowering the first-pass resolution enlarges the subject and reduces the influence of the original image, making the output follow your prompt more strongly.

  3. The reference image contains distracting or unwanted elements
    Lowering the resolution can help suppress them.
    Though in many cases, adjusting ControlNet strength is even more effective.

  4. Extremely low resolutions (for example 128) are usually too destructive
    The initial latent becomes too small, causing heavy detail loss and significantly reducing adherence to the reference image.


2. ControlNet Strength

This controls how strongly ControlNet influences the generation.

I usually use:

  • 1.0

Without the second sampling pass, 1.0 often produces awful-looking results.
But in the dual-sampling workflow, 1.0 works surprisingly well:

  • strong structural adherence

  • while still allowing the second pass to restore image quality and details


3. Final Resolution (Second-Pass Resolution)

This is your final upscale sampling resolution.

I usually use:

  • long edge = 1536

This tends to produce clean and detailed images while keeping rendering mistakes relatively manageable.

Since the base latent structure has already been established during the first pass, the second-pass resolution has much less influence on focal length and depth-of-field behavior.
This gives you much more freedom to scale image quality independently.

Higher resolutions produce:

  • more sharpness

  • more texture detail

  • richer micro-details

But in very complex scenes, excessively large resolutions can also introduce:

  • incorrect clothing details

  • broken background objects

  • random hallucinated elements

In most cases, I avoid going beyond:

  • 1920

The second-pass resolution generally has relatively little impact on prompt adherence.


Personal Experience & Tuning Tips

My default starting setup is usually:

  • first-pass resolution: 768

  • ControlNet strength: 1.0

  • second-pass resolution: 1536

Then I adjust from there based on the results.

If the generated subject differs too much from what I want

I primarily reduce the first-pass resolution to weaken the influence of the original image.

If that still isn’t enough — or if the resolution becomes so low that important details disappear — I also reduce ControlNet strength.

Typical lower limits for me are roughly:

  • first-pass resolution ≥ 384

  • strength ≥ 0.8

Though in special cases, I’ve gone as low as:

  • resolution = 256

  • strength = 0.5


If the reference image contains many characters or very small subjects

1536 may not provide enough detail density.

In those cases, I increase the second-pass resolution moderately to improve detail rendering.

Usually I stay below:

  • 1920


Sampling Step Distribution

I usually use:

  • first pass = 2 steps

You can adjust this depending on your needs.

For example:

  • if adherence to the reference image is insufficient,
    you can slightly increase first-pass steps

Personally, I generally keep:

  • first pass ≤ 3 steps

  • second pass ≥ 6 steps


When nothing seems to work

If repeated parameter tuning still fails to produce the result I want, I often take the best partially successful output and use it as the new reference image.

Then I repeat the process iteratively.

Surprisingly often, this works much better than endlessly fighting the original reference image.


End

Hopefully this workflow can help people struggling with ZIB ControlNet setups.

And finally, good luck to everyone — hope you all generate the images you actually want.

このモデルで生成された画像