LTX 2.3 basic GGUF 720p workflow

This is same as default WF in ComfyUI, but it uses GGUF custom node. Basically, you can insert images, audio, and video into any frame, so anything is possible.

T2V, S2V, V2V, I2V First, last, middle frame.

voice clone: You can input a few seconds of audio, and then crop those same few seconds after the process is complete.

reference image: input a starting image and then instruct it to perform a completely different action. (However, the character descriptions remain the same.) Yes, this is what's called a failed I2V. Again, crop the initial image.

extend video: input the images and audio extracted from the video. It will be extended for the remaining length.

GGUF custom node: https://github.com/city96/ComfyUI-GGUF

(Please update your GGUF node and ComfyUI to the latest versions.)

LTX2.3 and other: https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

LTX2.3 GGUF: https://huggingface.co/QuantStack/LTX-2.3-GGUF/tree/main/LTX-2.3-distilled

VAE: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/vae

upscale model: https://huggingface.co/Lightricks/LTX-2.3/tree/main

text encoder:

gemma3 GGUF: https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main

embedding: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders

Place the text encoder-related files here: ComfyUI\models\text_encoders

audio vae is here: ComfyUI\models\checkpoints