LTX 2.3 Video Re-Speaking OmniNFT + Relay Lip-Sync Replacement Workflow

세부 정보

모델 설명

Watch the full video first if you want to understand how this LTX 2.3 video re-speaking workflow works in practice. The video shows how an existing talking video can be guided by a new audio track, how the lip-sync control pipeline is organized, and how to launch the workflow online without rebuilding the full ComfyUI environment locally.

This ComfyUI workflow is designed for LTX 2.3 video re-speaking, word replacement, and audio-driven lip-sync generation. The main purpose of this workflow is to take an existing talking video or character reference and regenerate the mouth movement so it can match a new audio track. Instead of creating a completely new character from scratch, the workflow focuses on preserving the original person, framing, motion style, and video identity while changing the spoken content.

The workflow is built around the LTX 2.3 distilled 1.1 route. It uses ltx-2.3-22b-dev-dare-ties-distilled-1.1 as the main checkpoint, Gemma3 fp8 text encoding, LTX Audio VAE, LipDub IC LoRA, VBVR / I2V stabilization LoRA, LTXVAudioVAEEncode, LTXVSetAudioRefTokens, LTXAddVideoICLoRAGuide, LTXVCropGuides, ManualSigmas, CFGGuider, SamplerCustomAdvanced, LTXVLatentUpsampler, LTXVAudioVAEDecode, and final video output.

The key control module is LipDub IC LoRA. This module helps the model focus on mouth movement, speaking rhythm, and audio-related facial changes. The workflow encodes the new audio into audio latent space, then uses LTXVSetAudioRefTokens to inject audio reference tokens into the conditioning. This allows the rendering stages to follow the replacement speech instead of only producing generic motion.

The video reference is handled through LTXAddVideoICLoRAGuide. This node injects the original video or visual reference into the generation process, helping the output preserve the face, camera angle, clothing, background, and overall identity. The workflow uses this guide across multiple rendering stages, so the character does not drift too far while the mouth movement is updated.

The generation process is divided into three stages. The first stage creates the base lip-sync motion and audio-aligned structure. The second stage inherits audio tokens from the first stage and performs latent-space refinement. The third stage uses the previous output as a stronger reference and applies final high-resolution refinement. This staged structure is important because lip-sync generation is sensitive: a one-pass workflow can easily produce unstable mouths, drifting faces, flicker, or weak audio alignment.

Compared with ordinary image-to-video or video-to-video workflows, this graph is more specialized for re-speaking. A normal V2V workflow may preserve motion but may not properly follow the new speech. A normal talking-head workflow may follow audio but may weaken the original video identity. This workflow combines video reference guidance, LipDub control, audio latent tokens, negative guidance, staged rendering, and latent upscaling to balance identity preservation and mouth synchronization.

This workflow is suitable for dialogue replacement, AI dubbing previews, character re-speaking, translated video demonstrations, virtual host correction, short-form talking clips, product explanation videos, Bilibili demonstrations, YouTube content, RunningHub showcases, and Civitai workflow examples.

Main features:

  • LTX 2.3 video re-speaking workflow

  • Replace speech while preserving video identity

  • OmniNFT + Relay-style prompt control

  • Distilled 1.1 model route

  • LipDub IC LoRA for mouth and speech control

  • LTXVAudioVAEEncode for new audio encoding

  • LTXVSetAudioRefTokens audio token injection

  • LTXAddVideoICLoRAGuide video reference guidance

  • LTXVCropGuides stage alignment

  • Three-stage rendering pipeline

  • LTXVLatentUpsampler high-resolution refinement

  • Final audio decode and video output

Suggested workflow:

Prepare a clean source video first. The face should be visible, the mouth area should not be blocked, and the camera should not shake too aggressively. Then prepare a clean replacement audio file with stable volume and clear speech. Load the video reference and the new audio into the workflow, then use a prompt that describes the speaker, lighting, framing, and natural speaking behavior. Run a short test first to check mouth alignment, identity preservation, and facial stability. If the mouth does not follow the audio strongly enough, increase the lip-sync guidance strength or simplify the visual prompt. If the face changes too much, reduce aggressive motion language and rely more on the original video guide. Once the first-stage result is stable, continue through the second and third refinement stages for cleaner output.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2057730271897800705?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1yRGj6XEaM/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2057730271897800705?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。

🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!

📺 Bilibili 更新(中国大陆及南亚太地区)

如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1yRGj6XEaM/

我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。

이 모델로 만든 이미지