LTX 2.3 Three-Stage Image-to-Video Performance Workflow

Details

Model description

This workflow is designed for LTX 2.3 image-to-video generation with a three-stage sampling structure. Its main purpose is to take a single reference image, preserve the subject identity and scene composition, and then push the result through multiple LTX 2.3 latent refinement stages so the final video has stronger motion control, better texture, cleaner detail, and more stable visual continuity.

Compared with a basic one-pass image-to-video workflow, this setup is built to release more of LTX 2.3’s generation potential. A normal I2V workflow often creates a quick motion preview, but it may suffer from weak movement, blurry details, unstable faces, background drift, body deformation, or sudden scene changes. This workflow uses staged sampling to first build the main motion, then refine the generated latent result through additional lower-sigma passes. The goal is not only to make the image move, but to turn it into a more polished cinematic video.

The workflow uses ltx-2.3-22b-dev as the main video model, Gemma 3 12B text encoding, LTX video VAE, LTX audio VAE, LTXVConditioning, LTXVPreprocess, EmptyLTXVLatentVideo, LTXVEmptyLatentAudio, LTXVImgToVideoConditionOnly, LTXVConcatAVLatent, LTXVSeparateAVLatent, SamplerCustomAdvanced, ManualSigmas, LTXVLatentUpsampler, tiled VAE decoding, audio decoding, and final video export. It also loads the ltx-2.3-22b-distilled-lora-384 route to strengthen the model behavior during the staged generation process.

The core advantage is the three-stage render pipeline. The first stage uses a wider sigma schedule to establish the base video motion from the input image and text prompt. The later stages continue from the generated latent and use lower sigma ranges such as 0.85, 0.7250, 0.4219, and 0.0. This helps refine the video without completely destroying the original image structure. In practical terms, the first pass creates the motion direction, while the following passes improve detail, stability, and final output quality.

The workflow also includes latent upscaling through LTXVLatentUpsampler. Instead of only upscaling the final decoded frames, it improves the video in latent space before final decoding. This is useful for LTX 2.3 I2V because the quality of face detail, clothing texture, lighting, environment structure, and camera stability depends heavily on the latent video before it becomes visible frames. Latent refinement can help the final output feel less like a rough AI preview and more like a publishable video clip.

The example prompt in the workflow focuses on a fantasy cinematic scene with an adult silver-haired female warrior riding a massive black beast, holding the hand of a silver-haired fox-ear woman inside a purple-red magic hall. The prompt includes character identity, costume, creature design, floating magic cards, smoke, energy effects, hair movement, cloth motion, mouth movement, expression changes, and camera stability. This makes the workflow a strong test case for complex character interaction, fantasy atmosphere, and controlled image-to-video animation.

The negative prompt suppresses common AI video failures such as low resolution, blur, grain, static frames, no movement, subtitles, overlays, watermarks, scene cuts, scene transitions, warping, extra hands, extra limbs, and unstable body parts. These controls are important because image-to-video generation can easily break when the prompt includes multiple characters, fantasy props, hair, cloth, creature anatomy, lighting effects, and speaking motion.

This workflow is ideal for creators who want to test LTX 2.3 image-to-video generation beyond a simple first-pass output. It can be used for cinematic character animation, fantasy image animation, AI short drama clips, talking character shots, creature scenes, stylized motion tests, RunningHub demos, Civitai previews, and YouTube / Bilibili workflow showcases. If you want to see how the input image, LTX 2.3 model route, three-stage sampler passes, latent upscaling, tiled decoding, and final video export work together, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2033775150256103426/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1t9woz8Ev5/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流: https://www.runninghub.ai/post/2033775150256103426/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流,实时查看生成效果。

如果觉得效果理想,你也可以在本地进行自定义部署。

🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!

📺 Bilibili 更新(中国大陆及南亚太地区)

如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频: https://www.bilibili.com/video/BV1t9woz8Ev5/

我会在 夸克网盘 持续更新模型资源:

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户,方便进行创作与学习。

Images made by this model