Z-image / Lumina 2 / Newbie fp16 ComfyUI plugin for old Nvidia GPUs
Details
Download Files
Model description
A ComfyUI plugin to patch Lumina 2 based models to use fp16 "safely" on old Nvidia GPUs (rtx 20xx and before). No clamping. Identical results.
Lumina 2 based models means: Z-Image, Lumina 2 and NewBie.
Those models don't support fp16, internal layers will overflow and you will get NaN as output (black/pure noise image).
ComfyUI v0.4 starts to clamp overflows, you will not get NaN but it changes output drastically, and so does your final image.
This patch can handle overflows in fp16 mode "safely". It will automatically recompute the layer in fp32 again if overflow was detected. No clamping. Thus identical results.
To clarify, this plugin isn't uploaded to GitHub as a normal plugin because it's very "dirty". "dirty" means it hot/monkey patches the ComfyUI core code. This approach is terrible from a programming perspective. But this is the simplest approach I can think of. I don't want be toast.
Tested on ComfyUI v0.7.
How to use:
Put this file in the ComfyUI "custom_nodes" dir.
Open it with a text editor. Modify settings.
Restart ComfyUI.
Add "ModelComputeDtype" node to your workflow and set dtype to "fp16".
Note about "reduce weight":
Linear algebra trick, can avoid most (90%) overflows. Does not change final results. (basically: if A x B overflowed in accumulation, then A / 32 x B x 32)
Model weights will be modified when loading. Thus does not support loading LoRA dynamically. The base model weights changed, and the LoRA weight patch will be invalid.
However, you can merge your LoRA and save a checkpoint in advance. FYI, You need to disable this when making checkpoint.
Supports fp16/bf16/scaled fp8 models. Does not support pure fp8/gguf models.
Note about Z-Image:
All overflows can be handled "safely". No clamping. Identical results.
If you enable "reduce weight", there will be no overflow at all.
Note about Lumina 2:
All overflows can be handled "safely". No clamping. Identical results.
If you enable "reduce weight", only 2 layers need to recompute (total 30)
Note about NewBie:
NewBie does NOT support fp16. Same as Lumina 2. I don't know why its author claims it supports fp16 in diffusers. Might be a copy-paste typo.
The text embedding must be clamped. It's overflowed before even reaching the DiT. But might be outliers (0.01%). I don't see difference during testing. So, still identical results.
If you enable "reduce weight", only 2 layers need to recompute (total 40)

