LongCat Avatar Single-Image Looping Digital Human Workflow

Details

Model description

Watch the full video first if you want to understand how this LongCat Avatar workflow works in practice. The video shows how one reference image and one audio track can be turned into a longer talking-avatar video, how loop continuation is handled, and how to launch the workflow online without rebuilding the full ComfyUI environment locally.

This ComfyUI workflow is designed for LongCat Avatar single-image looping digital human generation. Its main purpose is to take one character image and one driving audio file, then generate a talking-avatar video that can extend beyond a short first segment through loop-based continuation. Instead of repeatedly creating disconnected talking-head clips, this workflow builds a more continuous production route for long-form avatar narration, virtual hosting, and audio-driven digital human content.

The workflow is built around LongCat-Avatar-15_bf16.safetensors as the main avatar model. It also uses LongCat-Avatar-15_dmd_distill_lora_rank128_bf16.safetensors as the DMD acceleration LoRA, WanVideoWrapper generation nodes, WanVideo VAE, WanVideo scheduler, Whisper large v3 encoder, and LongCat Avatar embed extension. The audio is processed through Whisper to extract speech features, allowing the generated avatar to follow the timing, rhythm, and mouth movement of the input voice.

The visual side is driven by a single reference image. The image is resized into the target video format, encoded into a WanVideo latent, and used as the identity and appearance anchor for the avatar. This reference image controls the face, clothing, framing, lighting, background, and general visual style. Because the workflow focuses on one image, it is easier to keep the character stable than a multi-shot switching setup.

The key node is WanVideoLongCatAvatarExtendEmbeds. It combines the previous latent, audio embeds, reference image latent, frame count, overlap setting, and continuation logic into a LongCat-compatible conditioning structure. The workflow uses a segmented generation design, with a 93-frame segment and 13-frame overlap. The first stage generates the initial speaking segment, then the loop continuation stage uses previous frames and audio progress to extend the video while preserving visual continuity.

This is important for digital human production. A normal single-image talking-avatar workflow may work for a short clip, but it often becomes harder to maintain continuity when the audio is longer. This workflow is designed to continue the avatar from the previous generated segment, reducing hard cuts and making the final result easier to use for narration, tutorials, virtual host videos, product explanations, and long audio-driven content.

Compared with ordinary image-to-video workflows, this graph is more specialized. A basic I2V pipeline can animate an image, but it does not necessarily follow speech timing or handle long audio well. This LongCat Avatar workflow connects audio feature extraction, image identity preservation, segmented sampling, overlap-based continuation, and final video assembly into one reusable system.

This workflow is suitable for AI presenters, virtual anchors, anime hosts, single-character narration, talking character clips, educational explainers, product videos, Bilibili demonstrations, YouTube content, RunningHub showcases, and Civitai workflow publishing.

Main features:

  • LongCat Avatar single-image digital human workflow

  • One image + one audio input

  • Whisper large v3 speech feature extraction

  • LongCat-Avatar-15 main model route

  • LongCat Avatar DMD LoRA support

  • Single reference image identity preservation

  • Image resizing and WanVideo latent encoding

  • WanVideoLongCatAvatarExtendEmbeds conditioning

  • 93-frame segmented generation

  • 13-frame overlap for smoother continuation

  • First-stage generation plus loop extension

  • Audio-driven mouth movement and speaking rhythm

  • Final MP4 output through VHS VideoCombine

  • Trim-to-audio output behavior for cleaner publishing

Suggested workflow:

Prepare one clean character image first. The face should be visible, the mouth area should not be blocked, and the lighting should be stable. Then prepare a clear audio file with clean speech and limited background noise. Load the image and audio into the workflow, then check the prompt, frame settings, and loop continuation section. Start with a short test to confirm that the identity, lip movement, and camera framing are stable. If the face changes too much, simplify the prompt and keep the motion language gentle. If the speaking movement is too weak, make the avatar behavior more explicit in the prompt. After the first segment works, enable the loop continuation route to generate a longer digital human video aligned with the audio.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2059633182730973185?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1gdG161EYB/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2059633182730973185?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。

🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!

📺 Bilibili 更新(中国大陆及南亚太地区)

如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1gdG161EYB/

我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。

Images made by this model