LTX-2.3 Multiple Subject Reference Generation (Distilled GGUF) workflows

详情

模型描述

Overview

This workflow generates videos by referencing multiple elements. It uses the Distilled GGUF model for fast generation.

Core Features

Multi-Subject Reference

It references up to five images: 1 to 4 subject images and 1 background image. Since you cannot specify images by their numbers, you can achieve better results by describing the details of each image in the prompt. For more details, please visit https://huggingface.co/LiconStudio/LTX-2.3-Multiple-Subject-Reference.

Single / Extend Generation Mode

You can choose whether to generate the video once or extend it afterward.

Note: Extend mode does not support multi-subject reference. It simply extends the video based on the information from the very last second.

Three prompt input modes

  1. Prompt enhancement using Ollama

  2. Native LTX prompt enhancement

  3. Plain (no enhancement)

If Ollama is not needed, you can disconnect it from the Ollama SubGraph node or simply delete the node.

Optional Features

Preview Switch: Displays a preview during sampling.

Audio-driven: Generates a video that matches an existing audio file.

Full Mode: Generated using Steps 15 and CFG 3.0. This takes a very long time. While the camera work and motion may be slightly improved, the generation time is not practical.

Double-Frame Mode: For use with intense motion. By rendering at twice the frame rate, facial distortion is less likely to occur.

Audio Reference: Voice cloning is possible by combining this with ID-LoRA (id-lora-talkvid-3k, id-lora-celebvhq-3k). Please ensure the reference audio file contains only the recorded voice.

Note

If you encounter the error, please update ComfyUI-KJNodes and ComfyUI-LTXVideo to the latest version.

此模型生成的图像