base on framepack

The 1.5 version, trained exclusively on a single video, exhibits improved motion dynamics and more coherent action sequences compared to version 1.0. However, this approach has led to overfitting in certain areas, such as unnatural limb proportions, which I am currently addressing. For optimal results, I suggest generating content with a duration of 7.5 seconds and a resolution of 448x752.
you can use cowboy shot,hands on hips to generate first img

The FramePack LoRA training conducted using the Musubi tuner utilized 13 videos for the generation of big swing dance.

The training took approximately 24 hours on 4090. I would highly recommend using a GPU with greater than 24GB of VRAM for training.

I have only tested this on BF16 precision and have not conducted any evaluations under FP8 precision.

thanks to 青龙圣者 , for addressing some questions regarding training parameters.

So I’m wondering if a single LoRA could simply amplify all motion amplitudes significantly.

On the RTX 4080 32GB, it takes an average of 1 minute per second to generate.

모델 유형	LORA
기본 모델	Hunyuan Video
게시일	2025-06-11
학습된 단어	One person is dancing to the dance dabaichui.The person performs a series of confident dance moves, arching her back, raising her arms behind her head, and swaying her long hair to the rhythm.