LTX-2.3 ICLoRA LipDub in ComfyUI | Precise Lip-Sync Video Creation
세부 정보
파일 다운로드 (1)
이 버전에 대해
모델 설명
Turn any video into a perfect lip-synced talking masterpiece.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
This workflow lets you generate perfectly synchronized talking videos from your existing footage and audio tracks. Built with the advanced Lightricks model, it ensures natural lip movement and alignment to speech. You can quickly create convincing dialogue scenes without manual editing. Ideal for creators seeking accurate speech-sync and expressive realism in their visuals. Streamlined for efficient use and reproducible results with standardized inputs and outputs.
Important nodes:
Key nodes in Comfyui LTX-2.3 ICLoRA LipDub workflow
LTXICLoRALoaderModelOnly (#5012)
Loads the LipDub IC-LoRA and attaches it to the base model so lip motion follows the input speech without overriding identity. If you need stronger or subtler lip-control, adjust the LoRA weight here; keep it coordinated with any additional LoRA you apply in the stack to avoid over-conditioning.
LTXAddVideoICLoRAGuide (#5004)
Applies IC-LoRA guidance at the low-resolution stage using the downscaled reference frames. This is where the workflow first locks identity and mouth-region attention; use it for A/B testing by toggling the guide to see the effect of reference guidance on timing and articulation.
LTXAddVideoICLoRAGuide (#5014)
Reapplies IC-LoRA guidance at high resolution with the s2 frames so the refined pass preserves the same speaker identity and accurate lip shapes. If you change the high-resolution frame size, revisit this node to keep the reference guide consistent with your target output.
LTXVSetAudioRefTokens (#5006)
Binds the encoded speech to your text conditioning so the sampler aligns visemes with phonemes. Use the same audio latent across passes for stable results; this graph handles that automatically, but if you swap audio mid-run you should refresh both the conditioning and concatenated latent.
LTXVLatentUpsampler (#4975)
Upscales the video latent with the LTX-2.3 Spatial Upscaler x2 to make room for fine details before the high-resolution sampler. If VRAM is tight, pair this with smaller s2 dimensions or lighter tiling in the decoder to balance quality and throughput.
LTXVTiledVAEDecode (#4995)
Decodes the final latent to frames using tiling to fit large outputs on limited GPUs. Tune tile count and overlap here to trade speed for memory footprint; fewer tiles are faster but require more VRAM, while more tiles reduce VRAM at the cost of time.
Notes
LTX-2.3 ICLoRA LipDub in ComfyUI | Precise Lip-Sync Video Creation - see RunComfy page for the latest node requirements.

