Kiko9 WAN 2.1 Native (ComfyUI)

详情

下载文件 (1)

模型描述

🧠 Kiko9 ComfyUI WAN 2.1 Native Workflow

ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (torch.compile) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion — tailored for high-fidelity AI-enhanced video generation.

Link to workflow I use for start image:


📦 Workflow Overview


🛠️ Project Breakdown

🔧 Project Settings

  • Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.

    • User Action: Update root_path to your preferred save location.


🧮 Aspect Ratio Logic (Don't Touch)

  • Calculates width and height from image size using a float-to-int conversion for maintaining aspect ratio.

    • ⚠️ Do not modify unless you understand aspect ratio propagation.


📸 Image Generation for Video (Optimized Resolution)

  • When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.

🎯 Target Video Resolution

  • Target Size: 480x832

  • Aspect Ratio: 480 ÷ 832 ≈ 0.577

✅ Ideal Generation Resolution

To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.

Gen ResolutionAspect RatioNotes960x1664960 ÷ 1664 ≈ 0.577✅ Perfect aspect ratio match1024x15361024 ÷ 1536 ≈ 0.6667🔶 Slight crop or padding needed

🔄 Workflow

  1. Generate High-Res Images Use 960x1664 or larger with the same aspect ratio. Using FLUX, SDXL, etc.

🧮 Why This Works

  • High-res generation reduces artifacts and increases fidelity.

  • Downscaling averages pixels, smoothing jagged edges and noise.

  • Maintaining the same aspect ratio avoids warping or unnecessary padding.


📥 Loaders

  • Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.

  • VAE & CLIP Loader: Loads required VAE and CLIP encoders.

  • Power LoRA Loader (optional): For Power LoRa.

  • Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.

    • User Action:

      • Set ckpt_name, vae_name, and clip_name according to local model files.

      • Ensure files are in your configured ComfyUI model folders.


🖼️ Image / Resize

  • Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.


🌍 Global Settings

  • CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.

    • User Action: Customize these prompts per your subject/style.

  • Seed Generator / Upscale Factor: Controls random seed and image scale-up.

    • User Action: Set seed for reproducibility or leave -1 for random.


🔁 1st Pass (Initial Generation)

  • KSampler: Runs the initial inference.

  • VAE Decode & Video Combine: Decodes latent space to image, combines with source.

  • Slow Motion / PlaySound: Optional audio sync and slow-mo settings.

  • Select last frame for 2nd pass start frame. (Pop Up window)


🔁 2nd Pass (Refine & Extend)

  • Similar to 1st Pass but optimized for longer inference or higher quality.

  • Take last frame from 1st pass as 2nd pass starting image.

  • Get Mask Range From Clip: Extracts mask regions for attention.

  • Image Batch Multi: Processes multiple frames simultaneously.


📈 Upscaling & Frame Interpolation

  • Image Sharpen / Restore Faces: Post-processing enhancements.

  • Upscale Image (Real-ESRGAN or similar).

  • Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.

  • Slow Motion: Optional, adds frames and blends for cinematic slow-mo.


🧪 Experimental (Optional, Long Runtime)

  • Advanced enhancement or second-stage denoising/refinement.

  • Useful for batch rendering with very high quality needs.

    • ⏱️ Warning: These steps significantly increase processing time.


⚡ Torch Compile Setup (VERY IMPORTANT)

To unlock native acceleration via torch.compile, ensure you meet these requirements:

✅ Requirements

  • PyTorch 2.1+ with CUDA

  • NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)

  • Use latest nightly ComfyUI or manually apply torch.compile() patching.


💾 Saving Outputs

  • Controlled via Project Path Generator and Video Combine nodes.

  • Output format (e.g. .mp4, .png, .webm) should be explicitly set in Video Combine.


📋 Notes

  • ⚠️ First run of torch.compile will be slow due to graph tracing.

  • 🧠 Prompt tuning is crucial for WAN 2.1 — try detailed descriptions.

  • ⚠️ Not optimized for older machines.


🙋 FAQ

Q: My output is laggy or missing frames.

  • Check interpolation settings and slow motion settings — disable one if not needed.

Q: Workflow crashes during torch compile.

  • Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.

Q: Can I use this with other models like SDXL?

  • You can, but WAN 2.1 is optimized for this specific setup. Results may vary.


📎 Credits

  • Workflow design by Kiko9

  • WAN 2.1

  • ComfyUI team for the powerful modular engine


📂 Folder Structure Example

ComfyUI/
├── models/
│ ├── checkpoints/
│ ├── vae/
│ ├── clip/
├── output/
│ └── generated/
├── custom_nodes/


📊 End-to-End WAN 2.1 Generation Summary

StepDescriptionTime / Count. Resolution

Prompt StartInitial prompt execution begins 92.95 sec

Model LoadLoaded WAN21 model weights ~15,952 ms

First Comfy-VFI PassGenerated frames with TeaCache initialized ~6 min 13sec 480x832

Frames Generated (1st pass)Comfy-VFI output 231 frames 480x832

Second Comfy-VFI PassRepeats generation with same steps ~6 min 28 sec 480x832

Frames Generated (2nd pass)Comfy-VFI output(Implied 480x832

WanVAE Load (1st)Loaded latent space model ~1220 ms

WanVAE Load (2nd)Loaded again for reuse ~1304 ms

Face Restoration (GFPGAN)GFPGANv1.4 restored images 152 frames 512x512

Comfy-VFI Run (3rd)Generated additional frames ~unknown 960x1664 Frames Generated

(3rd pass)Comfy-VFI output 456 frames 960x1664

Comfy-VFI Run (4th)Final batch of generation~unknown 960x1664 Frames Generated

(4th pass)Comfy-VFI output304 frames960x1664Prompt EndFinal step of pipeline 1050.60 sec—

ℹ️ Notes:

  • "TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.

  • Face restoration step was applied to a subset (152 frames).

  • The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.

🗨️ Feedback & Contributions

Feel free to submit issues if you encounter bugs or want to contribute improvements.


🔥 Happy rendering!

此模型生成的图像