WAN 2.2 GGUF/start end frame/t2v/i2v 8gb/10 seconds to 4 minutes workflow
Details
Download Files
Model description
WAN 2.2 – First-to-Last-Frame Cinematic Workflow (with Radial/SparseSage Attention Patch)
Ultra-Stable 10-Second Videos Without Backlooping
Full Windows Guide (Triton • SpargeAttn • RadialAttn)
Runtime Requirements:
Windows • Python 3.10–3.11 • RTX 4060 Ti 8GB or higher
ComfyUI Version: 0.3.6+
WAN Version: WAN 2.2 GGUF (VAC + CLIP) / you can switch to a different model as you like
⭐ What this workflow does
This project creates perfectly stable 10-second videos with:
No ping-pong
No frame collapse
No morphing artifacts
Perfect start → end interpolation
Full world-space motion (not pixel morphing)
High temporal stability
Cinematic camera motion
Optional film-VFI slow-motion
Optional x4 upscale + sharpening
Unlike default WAN or standard samplers, this version uses:
SparseSageAttn + RadialAttn
to extend WAN’s attention window from ~80 frames to 161+ frames.
This allows WAN to render full 10 seconds as one consistent scene.
⭐ Features
True First-to-Last-Frame world building
SD3 “Shift” parameter (recommended: 50 for 10s clips)
Support for start + end images (24 fully controlled scenes)
Works for image-to-video and text-to-video
Smooth cinematic motion
Film grain, fog, monochrome stability
Optional ClearReality x4 Upscale
Optional Sharpen pass
Compatible with 8 GB GPUs
🔧 Installation (Windows)
Step 1 — Install Triton for Windows
WAN 2.2 + RadialAttn requires Triton.
Download the correct Windows wheel here:
https://github.com/woct0rdho/triton-windows/releases
Install inside your venv.
Step 2 — Install SparseSageAttn
Download the Windows wheel from:
https://github.com/woct0rdho/SpargeAttn/releases
Install inside your venv.
Step 3 — Install RadialAttn Node
Download from:
https://github.com/woct0rdho/ComfyUI-RadialAttn
Place inside your ComfyUI custom_nodes folder.
Step 4 — Restart ComfyUI
If Sparse / Radial Attn loads correctly, startup log will say:
“Using sparse_sage_attn as block_sparse_sage2_attn_cuda”
Then the patch is active.
📸 How the workflow works
1. Shift Node (SD3-style conditioning)
Increasing SHIFT tells WAN:
“Keep the scene physically consistent over time.”
Recommended per second of video:
SHIFT = seconds × 5
→ For 10 seconds: SHIFT = 50
This stabilizes the entire world motion.
2. First-to-Last Frame Sampler
Takes:
Start Frame (Image A)
End Frame (Image B)
And generates smooth world-space interpolation across 161 frames.
The attention patch extends WAN’s temporal memory so it no longer ping-pongs.
3. FILM VFI (Optional)
If enabled, this doubles or quadruples FPS smoothly.
Use after baseline render.
4. Upscale (Optional)
- Upscale Model = ClearReality x4
⭐ Important Tips
For 8GB GPUs:
Perform Upscale AFTER VFI
to avoid VRAM OOMDisable Upscale during testing (put Upscale + Sharpen in a Group Toggle)
🎥 How to use the workflow
1. Load your Start and End images
Use a pair per scene (A → B).
2. Insert your Cinematic Prompt
Example prompt structure:
“The video begins in a foggy German forest. The camera slowly glides forward along a muddy path. No morphing. This is a continuous world. At the end of the video, the camera reaches the abandoned village, keeping the same cinematic monochrome style.”
3. Set SHIFT = 50
(For 10 seconds)
4. Render First-to-Last
WAN will generate full 161-frame motion directly.
5. Optional: enable FILM VFI
For slow motion/smoother movement. Example : Wan 2.2 is trained with 16 fps - Set VFI x2 and the output to 32 fps - Set VFI x4 output 64 fps.
6. Optional: enable Upscale & Sharpen
For maximum clarity.
⭐ If you need help setting anything up
Copy-paste this entire description into ChatGPT and ask:
“I want to recreate this WAN 2.2 First-to-Last-Frame workflow exactly as described above (with Triton, SparseSageAttn, RadialAttn, SD3 SHIFT = 50, dual KSamplers, and the optional upscale/sharpen section).
Please help me rebuild this step by step in ComfyUI.”
Then it will guide you .
🏁 Final Notes
This workflow is designed for:
Cinematic travel shots
World-building
Stable long sequences
Scene-to-scene storytelling
Consistent motion with minimal artifacts
The combination of SHIFT, First-to-Last, and Attention Patching is what enables true 10-second scenes without looping.
Bonus Tip:
You can chain multiple scenes into one continuous film by connecting the “Last frame extractor” output of one scene to the “Start Image” input of the next scene. This ensures perfectly aligned transitions without re-loading images manually.