This workflow is designed for LTX-2 three-stage digital human video generation, built around a “1 + 3” rendering structure for stronger performance release, better facial stability, and higher final video quality. Its main purpose is to take a reference image and audio input, generate a stable talking digital human clip, and then progressively refine the result through multiple LTX latent stages instead of relying on a single rough render pass.

The workflow uses an LTX video generation route with image-to-video conditioning, audio VAE encoding, audio-video latent routing, staged sampling, latent upscaling, tiled VAE decoding, and final video export. This makes it more suitable for digital human production than a basic image-to-video graph, because it is designed to preserve the subject identity while also allowing the character to speak, move naturally, and maintain consistent framing across the full clip.

The “1 + 3” structure is the core of this workflow. The first stage builds the base digital human video from the source image, prompt, and audio latent. After that, three additional refinement passes continue working on the generated latent result. These later stages are used to improve texture, face quality, motion stability, lighting consistency, and final output sharpness. This staged design helps reduce common problems such as blurry faces, weak mouth motion, unstable expressions, noisy textures, and over-random body movement.

The workflow also includes LTX spatial latent upscaling through the ltx-2-spatial-upscaler model. Instead of only enlarging decoded frames after generation, the workflow refines the video in latent space, which can help preserve more structural quality before final decoding. This is especially useful for digital human clips because facial detail, eye stability, mouth shape, clothing texture, and background consistency are all sensitive to low-resolution generation artifacts.

The audio route is another important part of the setup. Audio is encoded into latent form and connected with the video latent path, allowing the workflow to generate a complete audio-video digital human result rather than a silent animation. The workflow also calculates frame count based on audio duration and FPS, making it more practical for talking-avatar clips, AI presenters, narration videos, product explainers, short-form social content, and virtual host demos.

The negative prompt is focused on suppressing common video failures such as low resolution, blurry frames, static output, no movement, subtitles, overlays, watermarks, scene cuts, scene transitions, warping, extra hands, extra limbs, and unstable body parts. This is important for digital human workflows because the goal is not exaggerated animation, but controlled performance: stable face, natural mouth movement, clear expression, steady camera, and usable final output.

This workflow is ideal for creators who want to test LTX-2 digital human generation with a stronger staged-render pipeline. It can be used for AI talking avatars, virtual hosts, social media presenters, educational narration, product explanation videos, character dialogue clips, Civitai previews, and RunningHub workflow demonstrations. If you want to see how the reference image, audio latent route, “1 + 3” staged rendering, LTX latent upscaling, and final digital human export work together, watch the full tutorial from the YouTube link above.

⚙️ Try the Workflow Online

👉 Workflow: https://www.runninghub.ai/post/2034922123667447810/?inviteCode=rh-v1111

Open the link above to run the workflow directly online and view the generation results in real time.

If the results meet your expectations, you can also deploy it locally for further customization.

🎁 Fan Benefits: Register now to get 1000 points, plus 100 daily login points — enjoy 4090-level performance and 48 GB of powerful compute!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you are in Mainland China or the Asia-Pacific region, you can watch the video below for workflow demos and a detailed creative breakdown.

📺 Bilibili Video: https://www.bilibili.com/video/BV1HgApzdEm3/

I will continue updating model resources on Quark Drive:

👉 https://pan.quark.cn/s/20c6f6f8d87b

These resources are mainly prepared for local users, making creation and learning more convenient.

⚙️ 在线体验工作流

👉 工作流： https://www.runninghub.ai/post/2034922123667447810/?inviteCode=rh-v1111

打开上方链接即可直接运行该工作流，实时查看生成效果。

如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。

📺 B站视频： https://www.bilibili.com/video/BV1HgApzdEm3/

我会在夸克网盘持续更新模型资源：

👉 https://pan.quark.cn/s/20c6f6f8d87b

这些资源主要面向本地用户，方便进行创作与学习。

Model Type	Workflows
Base Model	LTXV 2.3
Published	2026-05-13

LTX-2 Three-Stage 1+3 Digital Human Performance Workflow

Details

Download Files (1)

Model description

Images made by this model