🚀 Z-Image Turbo FP8 Hires Workflow (Low VRAM Optimized)

This is a high-efficiency ComfyUI workflow designed specifically for Low VRAM users. By utilizing FP8 Quantized Models and Latent Upscale technology, it generates high-resolution images (1024x1792) rapidly while maintaining minimal resource usage.

✨ Key Features

Extreme Low VRAM Usage: Full FP8 pipeline (Model & Text Encoder) to drastically reduce memory footprint.
Lightning Fast: Optimized for Turbo models and efficient sampling steps.
Hires Fix Pipeline: Utilizes Latent Upscale + 2nd Pass KSampler to ensure crisp details without heavy VRAM cost.
AuraFlow Architecture: Optimized using the ModelSamplingAuraFlow node.

📂 Models Required & Downloads

To ensure the workflow functions correctly, please download the following models and place them in your respective ComfyUI folders:

1. UNet Model (Place in `models/unet/`)

File Name: z-image-turbo-fp8-e4m3fn.safetensors
Download: HuggingFace - Z-Image-Turbo-FP8

2. CLIP / Text Encoder (Place in `models/clip/`)

File Name: qwen3-4b-fp8-scaled.safetensors
Download: HuggingFace - Qwen3-4B-FP8

⚙️ Key Settings & Configuration

This workflow operates on a 2-Pass system. Please adhere to the following settings for the best results:

🔹 Phase 1: Base Generation

Latent Size: Generates at a lower initial resolution (e.g., 512x896) to save compute resources.

🔹 Phase 2: Latent Upscale

Upscale Method: Uses LatentUpscaleBy.
Scale Factor: Default is 2 (resulting in a final output of 1024x1792).

🔹 Phase 3: Hires Fix (Refiner)

This step is crucial for image clarity and detail:

Sampler: res_multistep (Highly Recommended).
Denoise: Recommended range 0.5 - 0.6.
- < 0.5: Changes are minimal; the image may remain slightly blurry.
- > 0.6: Adds more detail, but setting this too high may alter the image structure or cause hallucinations.

📊 Performance Benchmark

Data based on actual testing:

GPUOutput ResolutionTime****NVIDIA RTX 5070 Ti1024 x 17928 ~ 9 sec

📝 Usage Tips

Memory Management: If you are extremely limited on VRAM, ensure no other large models are loaded in the background.
Prompting: Since this uses the Qwen text encoder, it has strong natural language understanding. Detailed, sentence-based prompts work very well.
Troubleshooting: If you notice the image details breaking or looking "burnt," try slightly lowering the denoise value in the second KSampler.

Model Type	Workflows
Base Model	ZImageTurbo
Published	11/29/2025

Z-image-hires Workflow

Details

Download Files

Model description

🚀 Z-Image Turbo FP8 Hires Workflow (Low VRAM Optimized)

✨ Key Features

📂 Models Required & Downloads

1. UNet Model (Place in `models/unet/`)

2. CLIP / Text Encoder (Place in `models/clip/`)

⚙️ Key Settings & Configuration

🔹 Phase 1: Base Generation

🔹 Phase 2: Latent Upscale

🔹 Phase 3: Hires Fix (Refiner)

📊 Performance Benchmark

📝 Usage Tips

Images made by this model

Z-image-hires Workflow

Details

Download Files

Model description

🚀 Z-Image Turbo FP8 Hires Workflow (Low VRAM Optimized)

✨ Key Features

📂 Models Required & Downloads

1. UNet Model (Place in models/unet/)

2. CLIP / Text Encoder (Place in models/clip/)

⚙️ Key Settings & Configuration

🔹 Phase 1: Base Generation

🔹 Phase 2: Latent Upscale

🔹 Phase 3: Hires Fix (Refiner)

📊 Performance Benchmark

📝 Usage Tips

Images made by this model

1. UNet Model (Place in `models/unet/`)

2. CLIP / Text Encoder (Place in `models/clip/`)