Name: LTX 2.3 Multi-Image Reference OmniNFT + Relay Video Fusion Workflow
Author: AIKSK

Watch the full video first if you want to understand how this LTX 2.3 multi-image reference workflow works in practice. The video shows how multiple images are fused into one video generation pipeline, why OmniNFT + Relay control matters, and how to launch the workflow online without rebuilding a complex local ComfyUI environment.

This ComfyUI workflow is designed for LTX 2.3 multi-image reference video generation, using OmniNFT, Relay-style prompt control, and the distilled 1.1 model route to create more controllable image-to-video results. The main purpose of this workflow is to let creators use several reference images at the same time instead of relying on only one starting frame. This makes it much more suitable for character consistency, multi-angle visual guidance, object continuity, environment reference, and cinematic video generation.

The workflow is built around the LTX 2.3 distilled 1.1 video route. It uses a Gemma3-based LTX text encoder, LTX video VAE, LTX Audio VAE, LTXVConditioning, LTX2_NAG for stronger negative guidance, and LTXVAddGuideMulti for multi-image reference control. The workflow also uses ManualSigmas, CFGGuider, RandomNoise, SamplerCustomAdvanced, LTXVConcatAVLatent, LTXVSeparateAVLatent, LTXVLatentUpsampler, tiled VAE decoding, and CreateVideo output.

The key node is LTXVAddGuideMulti. This node allows multiple images to act as visual guides across the video timeline. Each guide can be assigned a frame index and strength value, so the workflow can control when a reference image becomes important and how strongly it affects the output. One image can define the main character, another can define the scene, another can provide clothing or object details, and another can guide a later-frame direction.

The workflow uses a three-stage rendering structure. The first stage focuses on initial composition and motion foundation. The second stage handles latent continuation and latent-space expansion. The third stage performs high-resolution refinement after the latent upscaler. This staged approach is more stable than forcing the whole video into one single pass.

Compared with ordinary image-to-video workflows, this graph gives creators stronger control over visual continuity. A normal single-image workflow may struggle with stable identity, clothing consistency, background logic, or multi-reference storytelling. This multi-image workflow gives the model several visual anchors, making it better for character videos, cinematic shots, MV fragments, product-style video, and advanced LTX 2.3 demonstrations.

The final output can be decoded through tiled VAE decoding, combined with audio through CreateVideo, and prepared for publishing or further editing. This makes the workflow useful not only for testing, but also for practical video production.

Main features:

LTX 2.3 multi-image reference video workflow
OmniNFT + Relay-style multi-reference control
Distilled 1.1 model route for practical generation
LTXVAddGuideMulti frame index and strength control
Multiple image references for character, scene, object, and style guidance
LTX2_NAG negative guidance support
Three-stage rendering structure
ManualSigmas and SamplerCustomAdvanced control
LTXVLatentUpsampler high-resolution refinement
Audio-video latent concatenation and separation
Tiled VAE decoding
CreateVideo final output

Suggested workflow:

Prepare several clear reference images first. Use one image for the main character, one for the environment, one for clothing or object details, and one for a later-frame visual direction. Load them into the reference image inputs, then check the frame index and guide strength values inside LTXVAddGuideMulti. Start with a short test render to confirm whether the references are being fused correctly. If the output becomes chaotic, reduce guide strength or simplify the prompt. If a reference is too weak, increase its strength or move its frame index closer to the target moment. After the base composition is stable, continue into latent upscaling and final high-resolution refinement.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2058327091539701762?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1yRGj6XEaM/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2058327091539701762?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1yRGj6XEaM/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

モデルタイプ	ワークフロー
ベースモデル	LTXV 2.3
公開日	2026-05-25

LTX 2.3 Multi-Image Reference OmniNFT + Relay Video Fusion Workflow

詳細

ファイルをダウンロード (1)

モデル説明

このモデルで生成された画像