Kiko9 WAN 2.1 Native (ComfyUI)
詳細
ファイルをダウンロード (1)
このバージョンについて
モデル説明
🧠 Kiko9 ComfyUI WAN 2.1 Native Workflow
ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (torch.compile) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion — tailored for high-fidelity AI-enhanced video generation.
Link to workflow I use for start image:
📦 Workflow Overview
🛠️ Project Breakdown
🔧 Project Settings
Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.
✅ User Action: Update
root_pathto your preferred save location.
🧮 Aspect Ratio Logic (Don't Touch)
Calculates
widthandheightfrom image size using a float-to-int conversion for maintaining aspect ratio.⚠️ Do not modify unless you understand aspect ratio propagation.
📸 Image Generation for Video (Optimized Resolution)
When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.
🎯 Target Video Resolution
Target Size:
480x832Aspect Ratio:
480 ÷ 832 ≈ 0.577
✅ Ideal Generation Resolution
To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.
Gen ResolutionAspect RatioNotes960x1664960 ÷ 1664 ≈ 0.577✅ Perfect aspect ratio match1024x15361024 ÷ 1536 ≈ 0.6667🔶 Slight crop or padding needed
🔄 Workflow
Generate High-Res Images Use
960x1664or larger with the same aspect ratio. Using FLUX, SDXL, etc.
🧮 Why This Works
High-res generation reduces artifacts and increases fidelity.
Downscaling averages pixels, smoothing jagged edges and noise.
Maintaining the same aspect ratio avoids warping or unnecessary padding.
📥 Loaders
Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.
VAE & CLIP Loader: Loads required VAE and CLIP encoders.
Power LoRA Loader (optional): For Power LoRa.
Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.
✅ User Action:
Set
ckpt_name,vae_name, andclip_nameaccording to local model files.Ensure files are in your configured ComfyUI model folders.
🖼️ Image / Resize
Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.
🌍 Global Settings
CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.
✅ User Action: Customize these prompts per your subject/style.
Seed Generator / Upscale Factor: Controls random seed and image scale-up.
✅ User Action: Set
seedfor reproducibility or leave -1 for random.
🔁 1st Pass (Initial Generation)
KSampler: Runs the initial inference.
VAE Decode & Video Combine: Decodes latent space to image, combines with source.
Slow Motion / PlaySound: Optional audio sync and slow-mo settings.
Select last frame for 2nd pass start frame. (Pop Up window)
🔁 2nd Pass (Refine & Extend)
Similar to 1st Pass but optimized for longer inference or higher quality.
Take last frame from 1st pass as 2nd pass starting image.
Get Mask Range From Clip: Extracts mask regions for attention.
Image Batch Multi: Processes multiple frames simultaneously.
📈 Upscaling & Frame Interpolation
Image Sharpen / Restore Faces: Post-processing enhancements.
Upscale Image (Real-ESRGAN or similar).
Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.
Slow Motion: Optional, adds frames and blends for cinematic slow-mo.
🧪 Experimental (Optional, Long Runtime)
Advanced enhancement or second-stage denoising/refinement.
Useful for batch rendering with very high quality needs.
⏱️ Warning: These steps significantly increase processing time.
⚡ Torch Compile Setup (VERY IMPORTANT)
To unlock native acceleration via torch.compile, ensure you meet these requirements:
✅ Requirements
PyTorch 2.1+ with CUDA
NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)
Use latest nightly ComfyUI or manually apply
torch.compile()patching.
💾 Saving Outputs
Controlled via Project Path Generator and Video Combine nodes.
Output format (e.g.
.mp4,.png,.webm) should be explicitly set inVideo Combine.
📋 Notes
⚠️ First run of torch.compile will be slow due to graph tracing.
🧠 Prompt tuning is crucial for WAN 2.1 — try detailed descriptions.
⚠️ Not optimized for older machines.
🙋 FAQ
Q: My output is laggy or missing frames.
Check interpolation settings and slow motion settings — disable one if not needed.
Q: Workflow crashes during torch compile.
Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.
Q: Can I use this with other models like SDXL?
You can, but WAN 2.1 is optimized for this specific setup. Results may vary.
📎 Credits
Workflow design by Kiko9
WAN 2.1
ComfyUI team for the powerful modular engine
📂 Folder Structure Example
ComfyUI/
├── models/
│ ├── checkpoints/
│ ├── vae/
│ ├── clip/
├── output/
│ └── generated/
├── custom_nodes/ │
📊 End-to-End WAN 2.1 Generation Summary
StepDescriptionTime / Count. Resolution
Prompt StartInitial prompt execution begins 92.95 sec
Model LoadLoaded WAN21 model weights ~15,952 ms
First Comfy-VFI PassGenerated frames with TeaCache initialized ~6 min 13sec 480x832
Frames Generated (1st pass)Comfy-VFI output 231 frames 480x832
Second Comfy-VFI PassRepeats generation with same steps ~6 min 28 sec 480x832
Frames Generated (2nd pass)Comfy-VFI output(Implied 480x832
WanVAE Load (1st)Loaded latent space model ~1220 ms —
WanVAE Load (2nd)Loaded again for reuse ~1304 ms —
Face Restoration (GFPGAN)GFPGANv1.4 restored images 152 frames 512x512
Comfy-VFI Run (3rd)Generated additional frames ~unknown 960x1664 Frames Generated
(3rd pass)Comfy-VFI output 456 frames 960x1664
Comfy-VFI Run (4th)Final batch of generation~unknown 960x1664 Frames Generated
(4th pass)Comfy-VFI output304 frames960x1664Prompt EndFinal step of pipeline 1050.60 sec—
ℹ️ Notes:
"TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.
Face restoration step was applied to a subset (152 frames).
The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.
🗨️ Feedback & Contributions
Feel free to submit issues if you encounter bugs or want to contribute improvements.
🔥 Happy rendering!

