WAN 2.2 GGUF/start end frame/t2v/i2v 8gb/10 seconds to 4 minutes workflow

Details

Model description

WAN 2.2 – First-to-Last-Frame Cinematic Workflow (with Radial/SparseSage Attention Patch)

Ultra-Stable 10-Second Videos Without Backlooping

Full Windows Guide (Triton • SpargeAttn • RadialAttn)

Runtime Requirements:

Windows • Python 3.10–3.11 • RTX 4060 Ti 8GB or higher

ComfyUI Version: 0.3.6+
WAN Version: WAN 2.2 GGUF (VAC + CLIP) / you can switch to a different model as you like


What this workflow does

This project creates perfectly stable 10-second videos with:

  • No ping-pong

  • No frame collapse

  • No morphing artifacts

  • Perfect start → end interpolation

  • Full world-space motion (not pixel morphing)

  • High temporal stability

  • Cinematic camera motion

  • Optional film-VFI slow-motion

  • Optional x4 upscale + sharpening

Unlike default WAN or standard samplers, this version uses:

SparseSageAttn + RadialAttn
to extend WAN’s attention window from ~80 frames to 161+ frames.

This allows WAN to render full 10 seconds as one consistent scene.


Features

  • True First-to-Last-Frame world building

  • SD3 “Shift” parameter (recommended: 50 for 10s clips)

  • Support for start + end images (24 fully controlled scenes)

  • Works for image-to-video and text-to-video

  • Smooth cinematic motion

  • Film grain, fog, monochrome stability

  • Optional ClearReality x4 Upscale

  • Optional Sharpen pass

  • Compatible with 8 GB GPUs


🔧 Installation (Windows)

Step 1 — Install Triton for Windows

WAN 2.2 + RadialAttn requires Triton.
Download the correct Windows wheel here:
https://github.com/woct0rdho/triton-windows/releases

Install inside your venv.

Step 2 — Install SparseSageAttn

Download the Windows wheel from:
https://github.com/woct0rdho/SpargeAttn/releases

Install inside your venv.

Step 3 — Install RadialAttn Node

Download from:
https://github.com/woct0rdho/ComfyUI-RadialAttn

Place inside your ComfyUI custom_nodes folder.

Step 4 — Restart ComfyUI

If Sparse / Radial Attn loads correctly, startup log will say:
“Using sparse_sage_attn as block_sparse_sage2_attn_cuda”

Then the patch is active.


📸 How the workflow works

1. Shift Node (SD3-style conditioning)

Increasing SHIFT tells WAN:
“Keep the scene physically consistent over time.”

Recommended per second of video:
SHIFT = seconds × 5
→ For 10 seconds: SHIFT = 50

This stabilizes the entire world motion.


2. First-to-Last Frame Sampler

Takes:

  • Start Frame (Image A)

  • End Frame (Image B)

And generates smooth world-space interpolation across 161 frames.

The attention patch extends WAN’s temporal memory so it no longer ping-pongs.


3. FILM VFI (Optional)

If enabled, this doubles or quadruples FPS smoothly.
Use after baseline render.


4. Upscale (Optional)

  • Upscale Model = ClearReality x4

Important Tips

For 8GB GPUs:

  • Perform Upscale AFTER VFI
    to avoid VRAM OOM

  • Disable Upscale during testing (put Upscale + Sharpen in a Group Toggle)


🎥 How to use the workflow

1. Load your Start and End images

Use a pair per scene (A → B).

2. Insert your Cinematic Prompt

Example prompt structure:

“The video begins in a foggy German forest. The camera slowly glides forward along a muddy path. No morphing. This is a continuous world. At the end of the video, the camera reaches the abandoned village, keeping the same cinematic monochrome style.”

3. Set SHIFT = 50

(For 10 seconds)

4. Render First-to-Last

WAN will generate full 161-frame motion directly.

5. Optional: enable FILM VFI

For slow motion/smoother movement. Example : Wan 2.2 is trained with 16 fps - Set VFI x2 and the output to 32 fps - Set VFI x4 output 64 fps.

6. Optional: enable Upscale & Sharpen

For maximum clarity.


If you need help setting anything up

Copy-paste this entire description into ChatGPT and ask:

“I want to recreate this WAN 2.2 First-to-Last-Frame workflow exactly as described above (with Triton, SparseSageAttn, RadialAttn, SD3 SHIFT = 50, dual KSamplers, and the optional upscale/sharpen section).
Please help me rebuild this step by step in ComfyUI.”

Then it will guide you .


🏁 Final Notes

This workflow is designed for:

  • Cinematic travel shots

  • World-building

  • Stable long sequences

  • Scene-to-scene storytelling

  • Consistent motion with minimal artifacts

The combination of SHIFT, First-to-Last, and Attention Patching is what enables true 10-second scenes without looping.


Bonus Tip:
You can chain multiple scenes into one continuous film by connecting the “Last frame extractor” output of one scene to the “Start Image” input of the next scene. This ensures perfectly aligned transitions without re-loading images manually.

Images made by this model

No Images Found.