Z-Image Turbo [TensorCoreFP8]
Details
Download Files
Model description
Yes. 40% smaller, and 50% FASTER!
This is a new FP8 scaled checkpoint that supports latest ComfyUI features: mixed precision, post-training calibration and FP8 tensor cores support.
This model has calibrated metadata and ComfyUI will do FP8 calculations directly on supposed hardware, instead of BF16, which is much faster (+50% it/s) than BF16 and classic FP8 models (which only contains FP8 quantized weights but no FP8 calculations).
About Z-image: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
In short:
Mixed precision:
Early and final layers are still BF16. Middle layers are FP8. That's why this model is 1GB larger than classic FP8 model.
Post-training calibrated and FP8 tensor core support:
If you have a newer Nvidia GPU (probably RTX 4xxx and later):
Those GPUs have native hardware FP8 calculation support. This model has post-training calibrated metadata. ComfyUI will automatically utilize those fancy tensor cores and do calculations in FP8 directly, instead of BF16.
On 4090, comparing to using BF16 model:
classic FP8 scaled model: -8% it/s (fp8 -> bf16 dequantization overhead)
classic FP8 scaled model + torch.compile: +11% it/s
this model: +31% it/s
this model + torch.compile: +60% it/s
On 5xxx GPUs it should be faster than above because newer tensor cores and better fp8 support. Not tested.
To use torch.compile, I recommend torch.compile nodes from "ComfyUI-KJNodes".
However, about torch.compile: as of me writing this (11/28/2025), ComfyUI v0.3.75 has a small bug and can't torch.compile FP8 model that uses tensor cores. It has been fixed. So remember to update ComfyUI v0.3.76 and retry it in the future. Or switch to master branch for now.
If your GPU does not have FP8 tensor core:
No worries. This model can still save you ~40% VRAM.
FYI: This model (the way ComfyUi utilizing FP8 tensor cores and doing linear) is compatible with all kinds of attention optimizations (sage attention etc.). But this is another topic.



