base on framepack

The 1.5 version, trained exclusively on a single video, exhibits improved motion dynamics and more coherent action sequences compared to version 1.0. However, this approach has led to overfitting in certain areas, such as unnatural limb proportions, which I am currently addressing. For optimal results, I suggest generating content with a duration of 7.5 seconds and a resolution of 448x752.
you can use cowboy shot,hands on hips to generate first img

The FramePack LoRA training conducted using the Musubi tuner utilized 13 videos for the generation of big swing dance.

The training took approximately 24 hours on 4090. I would highly recommend using a GPU with greater than 24GB of VRAM for training.

I have only tested this on BF16 precision and have not conducted any evaluations under FP8 precision.

thanks to 青龙圣者 , for addressing some questions regarding training parameters.

So I’m wondering if a single LoRA could simply amplify all motion amplitudes significantly.

On the RTX 4080 32GB, it takes an average of 1 minute per second to generate.

Model Type	LORA
Base Model	Hunyuan Video
Published	2025-06-11
Trained Words	One person is dancing to the dance dabaichui.The person performs a series of confident dance moves, arching her back, raising her arms behind her head, and swaying her long hair to the rhythm.