LTX-2 Eat

LoRA that gobbles everyone! (Now with sound (really, turn it on on examples))

Basically, you start the video with a subject. Then, suddenly, the camera zooms out revealing that the subject is now miniaturized and then another character steps up and eats them cartoonishly and non-graphically.

This is my fifth LTX-2 LoRA (published globally). Now, this is the start of porting my legacy Civitai loras from Wan to LTX-2.

This LoRA is best working with first-last frame, however start frame may be sufficient if you describe the other subject well. Beware, FLF inherits all LTX-2's flaws and it can do slideshow-like things from time to time (Idk why it spawns the first and the end frame at the end, best way is to simply cut it). Easter egg: characters can devour themselves in a loop if you set the first and the last frames the same pictures.

In contrast to all my previous LTX-2 LoRAs, this one was superhard to train. With CREPA, TREAD, FFN unfreeze, higher rank, Prodigy, the loss didn't lower much and even showed signs of divergence (initially stable loss curve eventually progressing to insanely frequent oscillations without decline). Needless to say, all I could see was pure body horror. With the tongues, the hands themselves being eaten, distorted limbs, etc.

Then I remembered that for sharper results in the case of high oscillations not MSE, but Huber loss is needed. I used scheduled Huber loss (exponential) and it much stabilized the loss curve, producing the much needed downturn at last. Interestingly, this loss choice caused the CREPA regularization loss curve's shape not be just a monotonous sigmoid and even have smooth hills.

Warning: because deep features CREPA or TREAD was used, some of the videos might have slightly washed out feel. If you experience it, try adding vivid colors to the positive prompt, and things like washed out,gray to the negative prompt, and also if the start images are themselves vivid, it will go much better.

The runtime for this experiment totaled 5 hours (and five failed attempts, ranging up to 8 hours). The hardware used for training was 1x5090, with zero blocks swapped, ~4 s/it.

The dataset consists of 6 organic; video fragments (repeated 2 times), which the original LoRA was trained on, plus 47 picked Wan2.2 generations made with that LoRA applied. Overall, the final checkpoint was picked at 4000 optimization steps.

The SimpleTuner training and dataset configs are under config.json and ltx2-multiresolution-eat-t2v-v2.json respectively.

The ComfyUI workflows are inside the .mp4 video files or on the Huggingface repo.

The Huggingface host for the LTX-2 LoRA is at https://huggingface.co/kabachuha/ltx2-eat.

Trigger words

You should use eat style to trigger the image generation.

Actually, you shouldn't now, because LTX-2 will add an utterance "eat style" at the beginning of the video. Just describe the action similar to the prompts from the examples and it will do the job!

For Wan2.1 (legacy):

This LoRA introduces the concept of subjects and things eaten in cartoon-like way by being suddenly tossed into a giant mouth and then chewed and consumed non-graphically.

To choose the eater and/or the thing being eaten, use VACE. The examples illustrate each mode, in order: 2 x first-frame2video, first-last2video, last-frame2video, pure text2video (not recommended, as it's slightly retrained in this mode).

The generation resolution is advised to be 512x512 or close in the spatial dimensions, with recommended duration 45 base frames (49 total, 3 seconds).

The t2v training was made using diffusion-pipe for 100 epochs and flow shift of 4.5.

For image2video/flf2video recommended to use with the kijai VACE workflows, standard 1.0 lora weight, 4.0-6.0 cfg, 8.0-16.0 shift + cfg_zero_star. (see videos meta in comfy)

Best works on cartoon-stylized and anime characters, can be weird on realistic. For realistic, supplying an additional existing cartoon-style reference is advised.

Known issues: the object sometimes is bit/chewed like a gum, but not swallowed. (can be slightly countered with adding pushing it inside with the hand.)

The trigger word is 'eat style'. The best prompts are:

"""

eat style. The video begins with [object]. Then a gigantic cartoon hand seizes the [object] from below and tosses it into a gigantic mouth, which appeared on the right side. The camera zooms out, showing the new [eater] chewing and fully swallowing the old tiny [object].

"""

P.S. In case you can't find the metadata, the example workflow for the wolf is here https://gist.github.com/kabachuha/a4b5ed1b46b6d4fb5f9e91d8aae1e482

Model Type	LORA
Base Model	LTXV2
Published	2026-01-16
Trained Words	eat style

Cartoonishly Eaten - LTX-2 / Wan2.1 14b T2V

Details

Download Files (1)

About this version

Model description

LTX-2 Eat

Trigger words

For Wan2.1 (legacy):

Images made by this model