LongCat Video Avatar 1.5 Single Character ComfyUI | Audio2Video Sync

세부 정보

파일 다운로드 (1)

모델 설명

Turns character image and audio into perfect lip-synced avatar video.

Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.

Open preloaded workflow on RunComfy

Open preloaded workflow on RunComfy (browser)

Why RunComfy first
- Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.

When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.

How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.

Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.


Overview

This workflow helps you turn one character image and an audio clip into a perfectly aligned, talking avatar video. It leverages LongCat-Avatar-15 with WanVideoWrapper nodes for accurate lip synchronization. With Whisper audio analysis and Wan 2.1 VAE decoding, it generates vertical MP4 outputs ready for publishing. You can easily integrate and run it in your creative pipeline. Perfect for content creators, visual designers, and developers needing a reliable video avatar generator.

Important nodes:

Key nodes in Comfyui LongCat Video Avatar 1.5 Single Character ComfyUI workflow

LongCatAvatarWhisperEmbeds (#3)

Creates MultiTalk audio embeddings from Whisper that drive lip sync and micro‑timing. Keep fps and num_frames aligned with your export to avoid desync. When recordings vary in level, enable loudness normalization. This node comes from the WanVideoWrapper LongCat integration. Repo

WanVideoLongCatAvatarExtendEmbeds (#6)

Fuses the reference latent and audio embeddings into frame‑aware image‑embeds. If your speech is shorter than the target length, choose how to pad or loop so motion remains natural. Overlap and reference‑frame settings help maintain identity stability between slices on longer clips. Repo

WanVideoModelLoader (#8)

Loads the LongCat‑Avatar‑15 base with the selected LongCat Avatar LoRA for identity fidelity. Use it with the included VRAM management and block‑swap options when running on constrained hardware. Swap to a different LongCat variant or LoRA here to change style without rewiring. Repo

WanVideoSamplerv2 (#51)

The main generator that synthesizes frames from model, scheduler, text, and image‑embeds. Tune the classifier‑free guidance if you need tighter prompt adherence or looser motion. Fix the seed to lock reproducibility across multiple renders. Repo

ImageResizeKJv2 (#25)

Prepares a portrait‑oriented canvas so the avatar fills a 9:16 frame. Keep aspect‑correct crops around the face and shoulders for reliable identity encoding. Matching the encoder/decoder’s divisibility avoids edge artifacts.

VHS_VideoCombine (#14)

Muxes frames and audio into a single MP4 with your chosen frame rate and filename prefix. Enable metadata saving for easier iteration tracking. This node is part of VideoHelperSuite. Repo

Notes

LongCat Video Avatar 1.5 Single Character ComfyUI | Audio2Video Sync - see RunComfy page for the latest node requirements.

이 모델로 만든 이미지