Name: LTX 2.3 Text to Video OmniNFT + Relay Three-Stage No-Subtitle Workflow
Author: AIKSK

Watch the full video first if you want to understand how this LTX 2.3 text-to-video workflow works in practice. The video shows how a clean prompt can be turned into a complete video clip, how the three-stage rendering structure improves stability, and how to launch the workflow online without rebuilding the full ComfyUI environment locally.

This ComfyUI workflow is designed for LTX 2.3 text-to-video generation, using OmniNFT, Relay-style prompt control, and the distilled 1.1 model route to create clean video outputs from text prompts. The main purpose of this workflow is to make text-to-video generation more controllable, more stable, and more suitable for publishing, especially when users want a no-subtitle, no-watermark, no-extra-text output.

The workflow is built around the LTX 2.3 distilled 1.1 generation route. It uses an LTX 2.3 checkpoint, Gemma3-based text encoding, LTXVConditioning, EmptyLTXVLatentVideo, LTXVEmptyLatentAudio, LTX2_NAG negative guidance, ManualSigmas, CFGGuider, SamplerCustomAdvanced, LTXVLatentUpsampler, VAEDecodeTiled, LTXVAudioVAEDecode, and CreateVideo. The graph also includes seed control, fps control, universal negative prompting, VRAM management, and audio-video latent handling.

The core idea is to generate video from text while maintaining stronger control over structure, motion, and final image quality. The positive prompt defines the subject, action, camera movement, lighting, environment, atmosphere, and cinematic direction. The negative prompt is designed to suppress common LTX video problems, including low quality, flicker, unstable perspective, identity drift, broken anatomy, subtitles, captions, UI overlays, logos, watermarks, unreadable text, and unwanted audio artifacts.

The workflow uses a three-stage rendering structure. The first stage focuses on initial composition and motion foundation. It creates the base video latent and establishes the main visual direction. The second stage performs latent-space upscaling and refinement, allowing the workflow to improve structure and detail without rebuilding the whole video from scratch. The third stage applies final high-resolution polish, using another controlled sampling pass before tiled VAE decoding and video assembly.

Compared with ordinary text-to-video workflows, this graph is more production-oriented. A simple one-pass T2V workflow may be fast, but it often suffers from weak motion, unstable composition, flicker, random text artifacts, or poor detail. This workflow separates generation into clear stages, uses negative guidance to suppress unwanted subtitles and watermarks, and applies latent upscaling before final output. That makes it more useful for creators who need cleaner video results for tutorials, showcases, social media, and workflow publishing.

This workflow is suitable for cinematic shots, AI short clips, fantasy scenes, product-style motion, character motion tests, visual concept videos, MV fragments, Bilibili demonstrations, YouTube content, RunningHub showcases, and Civitai workflow examples. It is especially useful when you want to start from pure text but still keep the output clean and structured.

Main features:

LTX 2.3 text-to-video workflow
OmniNFT + Relay-style prompt control
Distilled 1.1 model route
Clean no-subtitle / no-watermark output direction
Gemma3 text encoder route
LTXVConditioning at controlled frame rate
Empty video latent and audio latent structure
LTX2_NAG negative guidance support
Universal negative prompt for clean video output
Three-stage rendering pipeline
LTXVLatentUpsampler high-resolution refinement
VAEDecodeTiled and CreateVideo final output

Suggested workflow:

Start with a clear text prompt. Define the subject, main action, camera movement, lighting, environment, and desired video style. Keep the first test short and avoid overloading the prompt with too many competing actions. If the output contains unwanted text, subtitles, logos, or unstable artifacts, strengthen the negative prompt and simplify the scene. If the motion is too weak, make the action and camera direction more explicit. If the composition is good but the detail is not enough, continue through the latent upscaling and final refinement stages. Once the three-stage result is stable, export the video and use it directly for publishing or further editing.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2057671098963161090?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1yRGj6XEaM/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2057671098963161090?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1yRGj6XEaM/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

모델 유형	워크플로우
기본 모델	LTXV 2.3
게시일	2026-05-25

LTX 2.3 Text to Video OmniNFT + Relay Three-Stage No-Subtitle Workflow

세부 정보

파일 다운로드 (1)

모델 설명

이 모델로 만든 이미지