Squish - One Hand Only! - LTX-2 / Wan2.2 i2v 14b

LTX-2 update!

Squish anything now with only one hand!

This LoRA is a twist on the Squish-style LoRAs where the object is taken from two sides and then squished inside. In contrast, the Hand-squish LoRA does it with one hand. It reaches the faraway object and... well.. squishes it. Into a mess. Figurines, characters, vehicles, granite – anything you want!

The classic recipe is used – Huber, FFN, CREPA. But... NO PRODIGY. The Prodigy run had diverged after a few hours. This optimizer was replaced with high starting LR AdamW Bf16 and polynomial schedule. In addition to this, I saw the grayish aspect of the videos slightly reduced.

The training converged in 3000 steps, 5 hours 7 minutes. (musubi swap 4 because of occasional long 121 frame videos). A 5090 was used.

This is my sixth LTX-2 LoRA and for the training recipes see my previous ones.

I'm quite glad the hands are quite anatomically correct and the occasions of six+ fingers and multiple hands are rare for me here :). However, the arms are still weird.

The dataset consisted of some organic items grabbing and then continuing them in Wan with the (two hands) Squish LoRA enabled just as the hand touches the object and a large number of the Wan version of this LoRA synthesized clips. Also, for regularization organic videos of hands opening and closed without items were added.

The ComfyUI workflows are inside the .mp4 video files or on Huggingface json file.

Actually, I think this LTX-2 version is now better, spatially correct and more diverse than the Wan one. Maybe it's because of the larger dataset and self-boosting thanks to Wan's distillation.

Wan2.2

This LoRA allow you to squish objects, not only when brought to the camera, but at a distance too! Unlike vanilla squish, you can use a single hand alone without much prompting!

Prompt format and TL;DR

Prompt:

In the video, an [object] is presented. A person's long right arm appears from the side, stretches all way to the [object] and grabs the [object]. The [object] is held in the person’s right hand in the distance. The person then presses on the [object], causing a sq41sh squish effect. The person keeps pressing down on the [object], further showing the sq41sh squish effect.

My usage settings are straightforward: 3.5 CFG, shift 8.0 (default), dpm++-sde scheduler, 25 (13+12) steps. No speedup, or otherwise additional LoRAs have been used. Made with Kijai's wrapper.

I used uncond skip of the 10th block (SLG), to enhance the video quality a bit. No other enhancements have been applied for the gallery generations. It all works greatly on 512x512, 640p and 720p.

For details, see the workflows in the videos metadata or their duplicates on GitHub.

Now TL;DR. Below I present my struggles with this LoRA creation. I think, it may be useful for the fellow LoRA trainers and techy users in general.

The Problem

Why have I even been motivated to create this lora?

The vanilla Squish LoRA and it's synthetic distillations have a huge bias of two hands complete with their fingers enveloping the object from the sides. Additionally, the object is torn from it's spatial place and then moved closer to the camera, worsening the immersion. The cool idea, I though, would be if not only you can attract the object, but on contrary, reach the object with an arm firstly!

Making only one hand to appear is a nightmare of prompt-engineering, LoRA strength and conditioning arithmetic in Comfy – at least, it was for me – that's why I decided to create this fine-tune.

Methodology

Because of the bias, the "fix" dataset was impossible to get from raw objects. Then I changed the starting frame to hands standing nearby the objects, and the success rate increased. I though it was over, but no, the left hands intervened very often, bringing the object to the camera and basically undoing all the progress. Only after wasting multiple hours on rewriting promopts, adjusting LoRA's strength and averaging the conditionings with ComfyUI's coefficients, I found a suitable preset.

To compose the correcting dataset, made a collection of hands touching various items in various places, then retrieve the last frame and continue it with the calibrated Wan2.2's image2video and the sqush LoRA.

Before the concatenation, the "real" fragment was being put through Wan's VAE to increase the plausibility. At such resolution, a thin iridescent vertical line was seen at the right edge, and cut it manually through a homegrown "vibe-coded" program with the same preset resolution for all videos, so not to create a lot of aspect buckets.

The dataset was composed of:

10 hand-crafted squish situations (two part video: firstly the hand reaches the object + hand squishes the object), 2 - synthetic squishes, when a hand touches an object (lazy image to video, without making the "reaching" part).
Regularization: 18 curated high-quality RemadeAI-derived synthetic samples from Omni-VFX dataset.

As I said, the success rate has been abysmal, and the dataset was tiny.

The data mix consisted of 4 repeats of the manual dataset + 1 instance of regulatization, per epoch.

The fine-tuning of the squish LoRA took 6 epochs for high noise and 4 epochs for low noise, one night on dual-GPU: 5090 + 4090 (data parallel with diffusion-pipe).

The training was conducted with Prodigy as the optimizer, and pseudo-Huber loss with the Huber constant of 0.5 as the loss function, to experiment with this new setting.

Usage settings

The usage settings are straightforward: 3.5 CFG, shift 8.0 (default), dpm++-sde scheduler, 25 (13+12) steps. No speedup, or otherwise additional LoRAs have been used.

I used uncond skip of the 10th block (SLG), to enhance the video quality a bit. No other enhancements have been applied for the gallery generations.

The training resolution is < 512 max width/height, however, now with the bias removal, you can try going longer. 512, 640 are working, and you can get results even at 720p! (because the lora is trained too much, ha-ha!) The gallery examples are at 720p, to show off :0

The number of training frames is 81 (~5 seconds at 16fps, or ~3 on 24fps), and this number is recommended, but it also works at 65 frames, other counts not tested.

I used Kijai's ComfyUI wrapper, native workflows may need slight adjustments. The workflows are attached as video metadata. If they are hard to get, they are at this Github Gist link in the json form.

The gallery starting images were produced with Qwen, SDXL and Flux – Edit and Kontext included.

Limitations

The training was made for a right hand to remove the bias faster, so it may not suit you if you are a lefty :)

Due to it being trained on mostly synthetic data, you can encounter a thin iridescent line at the right edge of the frame. I made all effort to crop it when preparing the dataset, but a trace still appears somewhat. (If you look closer, this trace can appear even on your LoRA-less generations, due to Wan2.2's nature) The trace vanishes at higher resolutions.

When I was making the dataset, it was virtually impossible to get the prompt fit right at higher dimensions, even using the contraptions listed above, and I had to make the training at low res (< 512 max dim). It may result in minor blurriness.

Just like with vanilla Squish, occasionally the legs disappear from the subjects. Although, I tried my best to remove the cases from the regularization dataset, this effect still happens, and I don't know why. :( Maybe, it's Wan training limitation. (try describing the legs and the footwear in the prompts)

If the object is too close to the camera, it can trigger the stock behavior and spawn the second hand, so it's best to work with visibly remote objects.

Ending words

This LoRA was made to get rid of Squish's two hands bias and to allow smashing items at a distance for more realism and shock effect. It was created in hopes of being fun and useful.

The best you can do for me, is to share some examples of how it is working for you in the gallery, whatever rating they are :)

Of course, if you encounter a problem, leave a comment.

Credits to RemadeAI for the initial concept :3

模型类型	LORA
基础模型	LTXV2
发布时间	2026-01-25
训练词汇	handsquish

Squish - One Hand Only! - LTX-2 / Wan2.2 i2v 14b

详情

下载文件 (1)

关于此版本

模型描述