WAN 2.2 14B Multi-Phase I2V/T2V Workflow
Details
Download Files
About this version
Model description
WAN 2.2 14B Multi-Phase I2V/T2V Workflow: Professional Video Generation Made Accessible
Supports both Image-to-Video and Text-to-Video generation with the same optimized architecture
Breaking Through Hardware Limitations
Have you ever wanted to create longer, higher-quality AI videos but kept running into memory errors? This workflow solves that problem by splitting video generation into four independent phases that work together like a relay race. Each phase does its job, cleans up after itself, and passes the baton to the next runner.
What's New: This workflow includes a custom-built WanSettingsController node that centralizes all video settings into one control point, eliminating the tedious process of manually updating dozens of nodes when you want to change resolution or aspect ratio.
What Makes This Workflow Special
Dual Mode: Image-to-Video AND Text-to-Video
This workflow is designed to handle both I2V (Image-to-Video) and T2V (Text-to-Video) generation:
Image-to-Video Mode - Load an input image and the workflow animates it through four refinement phases, adding motion, upscaling resolution, and interpolating frames. All four phases work seamlessly together, with each phase building on the previous one.
Text-to-Video Mode - Disable the image input using the CR Image Input Switch (4 way) node and let the workflow generate video purely from text prompts. This is where the WildcardPromptFromString node becomes critical—T2V quality lives and dies by your prompt, and wildcards let you generate diverse, high-quality variations across large batches.
Important T2V Limitation: Unlike I2V mode where phases can extend or enhance the video, T2V doesn't work well with multiple phases. Each phase generates a new scene from the text prompt rather than continuing the previous generation, breaking visual continuity. For T2V generation, you'll typically want to use only Phase 1 and disable Phases 2-4 using the Fast Groups Bypasser.
The same memory management and batch processing capabilities work identically in both modes—you can still run 30+ T2V generations overnight without crashes.
Multi-Phase Architecture: Doing More With Less
Instead of trying to generate your entire video in one massive operation that would crash most systems, this workflow uses a smart phase-based approach with four separate video generation phases:
The 4-Phase System - Each phase runs a complete WAN video generation cycle. This allows the workflow to:
Generate longer videos by chaining multiple generation cycles together
Overcome VRAM limitations by processing in manageable chunks
Clean memory between phases to prevent crashes during batch processing
How It Works: Each WanImageToVideo node generates a video segment. The output from one phase can feed into the next, allowing you to extend video length beyond what a single generation could produce. Between each phase, RAM cleaners and model unloaders free up memory, resetting the system for the next generation cycle.
Real-World Production Usage:
1 phase = Quick single generations, T2V workflows (~6-8 seconds)
2 phases = Standard production sweet spot (~12 seconds, ~15-20 minutes with lightning models)
3-4 phases = Longer showcase videos (~20-34 seconds, extended generation time)
Most daily production work uses 1-2 phases for efficiency and speed. Using 2 phases with lightning-based models, you can generate approximately 3 twelve-second videos per hour, enabling 30-50+ videos in overnight batch runs. The 3-4 phase capability is available when you need longer content for special projects.
The Power of Independence: Each phase can be disabled individually or in groups using the Fast Groups Bypasser node. Need just one quick generation? Use only Phase 1. Want to chain generations together for longer videos? Enable multiple phases. This modular design means you can dial in exactly what you need without wasting processing time or resources.
Note: This phase flexibility is particularly powerful for I2V workflows where phases can build on each other. For T2V workflows, you'll typically only use Phase 1 since each phase restart generates a new scene rather than continuing the previous one.
Memory Management: The Secret Sauce
The workflow includes aggressive RAM and VRAM cleaning at strategic points:
RAM Cleaners (SoftFullCleanRAMAndVRAM) - Placed between phases to clear system memory, preventing the slow memory creep that eventually crashes batch processing. These nodes ensure each phase starts with a clean slate.
Model Unloaders (SoftModelUnloader) - Actively remove models from VRAM when they're no longer needed. This is crucial for running large batches overnight without running out of video memory.
Execution Order Controllers (ImpactExecutionOrderController) - Ensure cleanup happens at exactly the right moment, forcing the workflow to finish one phase completely before moving to the next.
The Tradeoff: These cleanup nodes do add time to each generation cycle—models need to be reloaded from disk when the next batch starts. However, this is a strategic trade: spending a few extra seconds per video to reload models is vastly preferable to your entire batch crashing at video #15 because you ran out of memory. When you're running 30+ videos overnight, reliability trumps raw speed every time.
Together, these create a system that can process dozens or even hundreds of videos in a single batch without filling up memory and crashing. This is the difference between babysitting your computer and waking up to completed work.
LoRA Management System
LoRA Loader (LoraManager) - A sophisticated LoRA loading system that tracks which LoRAs are applied, their strengths, and their trigger words. This isn't just loading LoRAs—it's managing them intelligently.
Debug Metadata (LoraManager) - Captures all LoRA information into metadata that's compatible with Civitai and other platforms. When you upload your videos, people can see exactly what LoRAs you used.
TriggerWord Toggle - Allows you to easily enable or disable LoRA trigger words without editing your prompt, giving you quick A/B testing capability.
Central Control: WAN Settings Controller (Custom Node)
The WanSettingsController is a custom-built node designed specifically for this workflow. It solves one of the biggest pain points in complex video workflows: changing settings across dozens of connected nodes.
The Problem It Solves: In traditional workflows, adjusting your video resolution means hunting through your canvas, finding every node that needs width/height/frame settings, and manually updating each one. Miss a single node and your workflow breaks. Change your mind about aspect ratio? Start the treasure hunt again.
The Solution: This custom controller is your command center. Instead of hunting through dozens of nodes to change resolution, you adjust one dropdown menu and the entire workflow updates automatically.
Key Features:
24 Pre-Validated Resolutions - Every resolution is tested and confirmed to work with WAN 2.2 14B, from mobile-friendly 576×1024 portrait to cinema-quality 1920×1080 landscape
Dimension Locking - All resolutions are mathematically locked to multiples of 16 (WAN's technical requirement), so you'll never accidentally break your workflow with invalid settings
Aspect Ratio Labels - Every resolution clearly shows its aspect ratio (9:16, 16:9, 1:1, etc.) so you know exactly what you're getting
Optimized Defaults - The 960×1216 (10:16) resolution is marked as the sweet spot for quality vs. performance
Five Outputs, One Source - Width, Height, Length (frame count), Frame Rate, and Batch Size all flow from this single node to everywhere they're needed
Real-World Impact: Change your aspect ratio in one second, not five minutes. Test different resolutions without rewiring your workflow. Scale from portrait to landscape with a single dropdown selection.
This isn't just convenience—it's what makes testing and iteration actually feasible at production scale. When you're generating 50+ videos per day, this node saves hours of workflow management time.
Node Types Explained (The Building Blocks)
Custom Workflow Control
WanSettingsController - This is a custom-built node created specifically for this workflow. It centralizes all video settings into a single control point with 24 pre-validated WAN-compatible resolutions. Think of it as replacing a dozen knobs scattered across a control panel with one master dial. When you change the resolution dropdown, width, height, length, frame_rate, and batch_size outputs automatically update across your entire workflow. This eliminates the tedious (and error-prone) process of manually updating settings in multiple nodes. The node includes portrait orientations (576×1024 to 1080×1920), square formats (768×768, 1024×1024), and landscape orientations (832×480 to 1920×1080), all validated to work with WAN 2.2 14B and locked to 16-pixel multiples. This single innovation transforms workflow iteration from a chore into a single click.
Core Processing Nodes
KSamplerAdvanced (8 instances) - The heavy lifters that actually generate images and video frames using diffusion models. These nodes handle the AI's creative process, iteratively refining noise into coherent visuals.
WanImageToVideo (4 instances) - Specialized nodes that convert images into video using the WAN 2.2 14B model. Each instance handles one phase of the video generation pipeline.
VAEDecode (4 instances) - Converts latent space representations (the compressed format the AI works in) back into actual pixels you can see. Every image has to pass through a VAE to become viewable.
Video Creation and Export
CreateVideo (5 instances) - Assembles individual frames into video files, handling frame rates, codecs, and timing.
SaveVideo (5 instances) - Writes the completed videos to your drive with proper naming and metadata.
RIFE VFI - The frame interpolation engine that creates smooth in-between frames, doubling (or more) your effective frame rate using optical flow estimation.
Workflow Organization
ReroutePrimitive|pysssss (46 instances) - These are like junction boxes in electrical wiring. They let you connect distant nodes without spaghetti cables crossing your entire canvas. Essential for keeping complex workflows readable.
Fast Groups Bypasser (rgthree) - Your phase control panel. This single node lets you enable or disable entire groups of nodes, making it trivial to test specific phases or skip unnecessary processing.
Power Primitive (rgthree) - A smarter primitive node that can feed values to multiple inputs simultaneously, reducing clutter.
ImpactExecutionOrderController (4 instances) - Forces specific execution sequences, crucial for ensuring memory cleanup happens between phases rather than at random times.
Image Processing
ImageScale - Resizes images while maintaining quality, used in the upscaling phase.
ImageFromBatch - Extracts individual images from batch processing, useful for preview and quality checks.
ImageBatchMulti (3 instances) - Combines multiple images into batches for efficient processing.
CR Image Input Switch (4 way) - A critical router that lets you switch between four different input images OR disable image input entirely for Text-to-Video generation. This is your I2V/T2V mode selector—when you want pure T2V, this node cuts out the image input and lets the model generate from prompt alone. No rewiring needed to switch between modes.
PreviewImage (3 instances) - Displays images during generation so you can monitor progress without waiting for final output.
Text and Prompt Handling
CLIPTextEncode (2 instances) - Converts your text descriptions into the mathematical format (embeddings) that the AI understands.
Power Prompt - Simple (rgthree) - An enhanced prompt node with better formatting and organization options.
WildcardPromptFromString - Critical for T2V generation. Enables prompt randomization using wildcards (like {adjective}, {action}, {lighting}), letting you generate diverse variations across large batches. In Text-to-Video mode, prompt quality is everything—a mediocre prompt gets mediocre results, while a well-crafted prompt with strategic wildcards produces usable, compelling video. This node is your secret weapon for batch diversity: instead of generating 30 identical videos, you're generating 30 unique variations by randomly combining different descriptive elements. Essential for maintaining quality and variety in T2V workflows.
JoinStringMulti - Combines multiple text strings into one, useful for building complex prompts from modular pieces.
Model Loading
UNETLoader (2 instances) - Loads the WAN 2.2 14B model components. WAN 14B requires two separate models: a "high" model and a "low" model that work together during video generation. The two UNET loaders handle loading both model components that the WAN workflow needs.
CLIPLoader - Loads the text encoder that converts words into concepts the AI understands.
VAELoader - Loads the VAE (Variational Autoencoder) that converts between latent space and pixel space.
CLIPSetLastLayer - Controls how many layers of the text encoder to use, allowing fine-tuning of how literally the AI interprets prompts.
ModelSamplingSD3 (2 instances) - Configures the sampling behavior for Stable Diffusion 3 architecture models, controlling generation quality and characteristics.
Utility Nodes
MathExpression|pysssss (3 instances) - Performs calculations on values in your workflow, useful for dynamic frame counts, resolution scaling, and parameter adjustments.
VHS_GetImageCount (3 instances) - Counts frames in video sequences, essential for phase coordination and batch processing.
Number of Active Phases - Critical control node that must be set to match how many phases you're actually using (1-4). If you're using only Phase 1, set this to 1. Using Phases 1-3? Set it to 3. This node coordinates the workflow's execution and must match your Fast Groups Bypasser settings. Forgetting to set this correctly will cause workflow errors.
MarkdownNote - A documentation node where you can write notes about what different sections of your workflow do. Invaluable for complex setups.
ShowText|pysssss - Displays text values for debugging and confirmation that settings are correct.
SaveImageWithMetaData - Saves images with embedded generation parameters, so you can always recreate your results.
Memory Management (Critical!)
SoftFullCleanRAMAndVRAM|LP (2 instances) - Aggressively frees both system RAM and GPU VRAM, preventing memory accumulation during batch processing. Yes, this adds a few seconds to reload models at the start of each new generation, but that's the price of reliability—without these cleaners, you'd crash mid-batch instead of completing 30+ videos overnight.
SoftModelUnloader|LP - Removes models from VRAM when they're no longer needed, freeing space for subsequent phases. The model reload time is negligible compared to the hours you'd lose restarting a crashed batch.
The Big Picture: How It All Works Together
Think of this workflow like a production line in a factory:
Raw materials enter (your input image/prompt and settings from WanSettingsController)
Station 1 (Phase 1) runs a complete video generation cycle
Cleanup crew clears the workspace (RAM/VRAM cleaning)
Station 2 (Phase 2) runs another video generation cycle (optional—can be disabled)
Cleanup crew strikes again
Station 3 (Phase 3) runs another generation cycle (optional—can be disabled)
Final cleanup
Station 4 (Phase 4) runs the final generation cycle (optional—can be disabled)
Quality control (preview nodes show results)
Shipping (SaveVideo writes final files)
Each station runs a complete WAN generation cycle independently. If a station isn't needed, you turn it off with the Fast Groups Bypasser. If you're running multiple products (batch processing), the cleanup crews ensure the workspace never gets cluttered between generations.
Why This Matters
For Beginners: You get a professional workflow that handles complexity behind the scenes. Change one setting, get reliable results. Start with 1-2 phases for quick wins.
For Veterans: You get granular control over every phase, with the ability to disable what you don't need and batch process at scale without crashes. Optimize for your use case—quick 12-second videos for volume, or longer showcase pieces when quality demands it.
For Everyone: You get longer videos, higher resolution, smoother motion, and the ability to run it all overnight without running out of memory. With lightning models, expect approximately 3 twelve-second videos per hour using 2 phases, enabling 30-50+ video overnight batch runs.
This workflow represents months of optimization, testing, and problem-solving, all packaged into a system that "just works." Whether you're making content for social media, testing LoRAs, or pushing the boundaries of AI video generation, this workflow gives you the tools to do it efficiently and reliably.
Technical Requirements
ComfyUI with WAN 2.2 14B model support
WAN 2.2 14B Models: You need both the "high" and "low" model files (WAN 14B is a two-model system)
VRAM: Minimum 12GB for base operation, 16GB+ recommended for higher resolutions
RAM: 32GB+ recommended for batch processing
Custom Nodes Required:
ComfyUI-Impact-Pack (for execution controllers)
ComfyUI-Custom-Scripts (for math expressions)
rgthree-comfy (for power nodes and bypasser)
LitePicker/ComfyUI-MemoryManagement (for RAM/VRAM cleaners)
LoraManager nodes
WAN Settings Controller (custom node - INCLUDED with this workflow!)
RIFE VFI nodes
Video Helpers Suite
Getting Started
Installing the WAN Settings Controller (Custom Node)
The custom WanSettingsController node is included as wan_settings_controller.py. To install:
Copy
wan_settings_controller.pyto yourComfyUI/custom_nodes/directoryRestart ComfyUI
The node will appear in the
video/settingscategory
That's it! The node is self-contained with all 24 validated resolutions built in. No dependencies beyond base ComfyUI.
Running the Workflow
For Image-to-Video (I2V):
Install all required custom nodes (see Technical Requirements above)
Install the WanSettingsController custom node
Load the workflow in ComfyUI
Select your desired resolution in WanSettingsController dropdown
Set the "Number of Active Phases" node to match how many phases you're using (1-4)
If using only Phase 1, set to 1
If using Phases 1-3, set to 3
This must match the phases you have enabled via Fast Groups Bypasser
Load your input image using the CR Image Input Switch
Enable/disable phases as needed using Fast Groups Bypasser
Queue and let it run!
For Text-to-Video (T2V):
Follow steps 1-4 above
Set the "Number of Active Phases" node to 1 (T2V only uses Phase 1)
Use the CR Image Input Switch to disable image input
Disable Phases 2-4 using Fast Groups Bypasser
Each phase generates a new scene from the prompt, breaking continuity
Use only Phase 1 for T2V generation
Craft your prompt using the WildcardPromptFromString node
Use wildcards for variation:
{lighting|golden hour|dramatic shadows|soft diffused}Build modular prompts:
{subject} in {location}, {camera angle}, {mood}Remember: T2V quality depends heavily on prompt quality—invest time here
Queue your batch and review results
T2V Pro Tip: Since you're only using Phase 1, generation is much faster than multi-phase I2V. This makes T2V ideal for rapid iteration and testing—you can generate and evaluate prompts quickly, then scale up to larger batches once you've dialed in your wildcards.
The workflow handles the rest, managing memory, coordinating phases, and producing high-quality video without requiring constant supervision.
This workflow is designed for both learning and production. Study how the phases interact, experiment with disabling different sections, and scale up to batch processing when you're ready. The modular design means you can understand one piece at a time while still having a complete, working system from day one.
