Hunyuan Video Lora. Anime, Akame ga kill. Akame. v1
Details
Download Files
About this version
Model description
Hunyuan Video Lora. Anime, Akame ga kill. Akame. v1
My first lora training. What questions I have:
Which captions works the best? I followed structure like that: """
, , <who + visual description>, """ What resolutions of video to use. I used [768, 480]. Is it better to have videos with different resolutions or unified?
How to decide this value "frame_buckets = [1, 16, 33, 65, 97, 129]" I took this one because videos in dataset was from 0.6 sec to 4.93 sec.
What is the "video_clip_mode"? I selected the multiple_overlapping but why this instead of others.
What is more important our of those, if I want to improve quality of the lora:
A: collect more data;
B: make better captions;
C: only collect data for 1 task or a motion;
Is it worth training lora with images and videos or only videos?
It's hard to decide optimal inference parameters because there are a lot of knobs that you can change.
If someone have answers to questions above, I will be really happy to read them.
Description
Hunyuan Lora model trained on the short clips of Akame from the first episode of the anime, 29 clips in total with avg length: 2.16 sec . Trained using Diffusion-pipe repo.
You can check training configs, workflow, lora model and all data here. akame_v1
Inference params.
lora_strength: 1.0
dtype: bfloat16
resolution: [[768,480]] (width, heigth)
num_frames: 93
steps: 20
embedded_guidance_scale: 9.00 *note I found that this value was good for my other lora so I used same here, I think it worth to experiment with;
enhance video weight: 4.0 *note I think this parameter also can be adjusted and there are some other params in enhance video node.
Data
amount: 29 clips from 0.6 to 4.93 sec.
avg_length: 2.16 sec
Data was collected manually using OpenShot program. It took around 1 hour to collect 29 clips from 1 anime episode + 1 hour to create a captions for the clips using Sonnet 3.5 as a caption maker + manually correcting mistakes.
