FACESITTING HunYuan
Details
Download Files
About this version
Model description
MAJOR UPDATE: 12/26/204
I finally got the LORA to a state where I'm really happy with it. Version 1.0 is the result of an exhaustive series of versions (that I never posted on here) that turned out to be insufficient until eventually landing on this one. Unlike version 0.1, I eventually had to rent an h100NVL (93gb vram GPU) and train this in the cloud.
Major changes
#1: Dataset has been tripled allowing the LORA to learn a lot more
#2: The LORA was training with a max resolution size of 2048 (vs. 512 previously) allowing for much more detailed facesitting scenes
#3: Results in facesitting much more than 0.1.
I'm labeling this 1.0 because I'm happy enough with it now that I can consider it 1.0 ready. There were actually versions 0.2->0.9 along the way, too.
0.2: the same as 0.1 but with another 2k steps. It was only marginally better. Direct comparisons showed almost no difference.
0.3: Attempted to correct this by training another 2k steps but with the learning rate jacked up. It caused the model to become too cartoonish for some reason.
0.4: Started all over from scratch but this time with 4e-5 learning rate instead of 2e-5. This one got burned after 6k steps. And earlier steps just didn't quite look right.
0.5: Tripled the amount of training data and lowered he learning rate. Initially, it seemed to work, but it was too inconsistent and burned in somewhat.
0.6: restarted all over again, but this time I used both the larger dataset AND I lowered the learning rate back to 2e-5. The end result was good enough I ALMOST released it on here, except it was just a bit too blurry.
0.7 / 0.8 / 0.9: added [2048] to the resolutions so that it contained [512,1024,2048].
0.7 OOM'd halfway through and crashed my 4090. I increased gradients for 0.8 and it ended up learning almost nothing. I found a balance for 0.9 and it fit on my 4090 but so slowly it was basically going to take months to train.
1.0: I rented a h100i NVL and let it run for about 16 hours. the 2048 resolution size made it so that even that GPU went very, very slowly. But at least it got the job done. I'm actually still running it as I type this. If subsequent epochs result in even better LORA, I will ofc upload those.
I've never uploaded anything to this website before so forgive me if I do something that isn't in line with the typical protocol. The only reason I trained this LORA is because it was a LORA I wanted badly enough I didn't want to wait for someone else to do it. It took me quite a few hours, many of which involved figuring out how to get Linux dual booted as the trainer I used only works on Linux (Idk if there are any windows-compatible ones yet)
THIS is an EXTREMELY early beta version of a LORA I hope to continue to cook and make perfect. But even in its early state, it's already showing signs of real promise enough to where I think it's worth uploading. You SHOULD expect to generate either, A, the occasional nonsensical/body horror, or B, a video that doesn't show facesitting. But when it does work (and it works fairly often at least in my early testing) it looks amazing. (to me, anyway lol).
Sample prompt that worked most for me was: A very high quality cinematic video of a stunningly beautiful woman sitting on a man's face at the office where she works, the woman is wearing a pair of blue jeans and white tanktop, the woman has brown hair in a bun. The man's face is completely trapped underneath the woman's buttocks, she's facesitting him. The man struggles to breathe and get his head free, the man's nose is pressed into the woman's ass. The video was taken in a very crowded office. The other people there point and laugh as the man suffocates under ass.
You can just use this directly or substitute your own desired looks/clothing/setting in, or even change it completely. The LORA for some weird reason seems to perform best at the 4-5 second range. It can absolutely work at longer, but for some reason, at >=6 seconds, more wonky things can begin to happen.