Hunyuan Video 1.5 in ComfyUI | Efficient Text-to-Video Workflow
詳細
ファイルをダウンロード (1)
このバージョンについて
モデル説明
Turn text or images into smooth 1080p videos quickly and easily.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
With this workflow, you can easily transform text or images into clear, natural-motion videos while maintaining high fidelity and efficiency. Its DiT design supports seamless motion transitions, making it ideal for designers who want creative control without heavy hardware needs. You can upscale outputs to 1080p effortlessly and achieve realistic movement using fewer parameters. The interface allows intuitive customization, saving time on tuning. Perfect for rapid concept visualization, promo clips, or AI-driven storytelling.
Important nodes:
Key nodes in Comfyui Hunyuan Video 1.5 workflow
HunyuanVideo15ImageToVideo (#78)
Generates a video by conditioning on a start image and your prompts. Adjust its resolution and total frames to match your creative target. Higher resolutions and longer clips increase VRAM and time. This node is central to image-to-video quality because it fuses CLIP-Vision features with text guidance before sampling.
EmptyHunyuanVideo15Latent (#183)
Initializes the latent grid for text-to-video with width, height, and frame count. Use it to define sequence length up front so the scheduler and sampler can plan a stable denoising trajectory. Keep aspect ratio consistent with your intended output to avoid extra padding later.
CFGGuider (#129)
Sets classifier-free guidance strength, balancing prompt adherence against naturalness. Increase guidance to follow the prompt more strictly; lower it to reduce oversaturation and flicker. Use moderate values during base generation and lower guidance for super-resolution refinement.
BasicScheduler (#126)
Controls the number of denoising steps and the schedule. More steps usually mean better detail and stability but longer renders. Pair step count with sampler choice for best results; this workflow defaults to a fast, general-purpose sampler.
SamplerCustomAdvanced (#125)
Executes the denoising loop with your selected sampler and guidance. In the 1080p finishing chain, it works in two phases split by SplitSigmas to first establish structure at higher noise then refine low-noise details. Keep seeds fixed while tuning steps and guidance so you can compare outputs reliably.
HunyuanVideo15LatentUpscaleWithModel (#109)
Rescales the latent sequence to 1920×1080 using the dedicated upsampler from the repackaged weights. Upscaling in latent space is faster and more memory-friendly than pixel-space resizing, and it sets the stage for the distilled SR model to add fine detail. Larger targets demand more VRAM; keep 16:9 for best throughput.
…
Notes
Hunyuan Video 1.5 in ComfyUI | Efficient Text-to-Video Workflow — see RunComfy page for the latest node requirements.
