LTX Video RTX3060 12GB VRAM

I'm uploading this workflow for using with Stability Matrix app. Youtube video:

I have zero knowledge about how this model works, it's not SDXL, it's DIT Flux. ~~I'm currently with only 12GB of RAM so can't show better results yet, more steps and higher resolution may fix.~~

I know have 28GB, but LTX is really bad, you can try asking gemini for describing prompts. I've attempted with this prompt:
Your task is to imagine and describe a natural visual action or camera movement that could realistically unfold from the still moment, as if capturing the next 3 seconds of a scene. Focus exclusively on visual storytelling—do not include sound, music, inner thoughts, or dialogue.

Infer a logical and expressive action or gesture based on the visual pose, gaze, posture, hand positioning, and facial expression of characters. For instance:

- If a subject's hands are near their face, imagine them removing or revealing something

- If two people are close and facing each other, imagine a gesture of connection like touching, smiling, or leaning in.

- If a character looks focused or searching, imagine a glance upward, a head turn, or them interacting with an object just out of frame.

Describe these inferred movements with precision and clarity, as a cinematographer would. Always write in a single cinematic paragraph.

Be as descriptive as possible, focusing on details of the subject's appearance and intricate details on the scene or setting.

Follow this structure:

- Start with the first clear motion or camera cue.

- Build with gestures, body language, expressions, and any physical interaction.

- Detail environment, framing, and ambiance.

If any additional user instructions are added after this sentence, use them as reference for your prompt.

Example: The woman shifts her weight back, a slow, controlled motion that begins with a gentle flexion of her knees. She doesn't fully stand, but rather her body lowers a few inches, her weight settling onto her heels. Her torso remains straight, maintaining a poised posture as she descends. As she reaches the lowest point of the movement, her body settles, the soft curve of her lower back coming into sharper focus. She holds this low-to-the-ground position for a brief moment before she begins a new, fluid transition into a kneeling pose, her body unfolding with grace as she brings her knees to the floor. The camera stays in its medium shot, capturing the full motion and emphasizing the graceful control she has over her body.

Otherwise, focus only on the input image analysis.

Which is sent with the image as attachment, then you can shortly describe how you want it to perform the action and how it should end. Otherwise be precise, short and use natural language supported by SDXL or Flux.

It sucked less with 10 steps = CRF of 18 = 20 Frames and between seed 45-52

モデルタイプ	ワークフロー
ベースモデル	LTXV
公開日	2025-09-12

LTX Video RTX3060 12GB VRAM

詳細

ファイルをダウンロード (1)

モデル説明

このモデルで生成された画像