ClothConsistency-Wan2.2-I2V-ConsistencyLoRA2
Details
Download Files
Model description
The Samples in showcase are using both high and low with lightning-low lora.
IT IS A LORA,IT IS NOT A WORKFLOW.
Hello everyone, long time no see. I apologize for not releasing more models recently, as I've spent the last month researching innovative features for the Wan2.2-I2V model's LoRA.I've finally achieved some research results and would like to introduce you to this series of Wan2.2-I2V LoRA, which I personally call the ConsistencyLoRA series. The function of this LoRA series is to take an input image and directly generate a video that maintains a high degree of consistency with that image, using the Wan2.2-I2V model.
大家好,好久不见.由于最近一个月都在研究Wan2.2-I2V模型Lora的创新功能,没有发更多的模型,抱歉.最近终于有了一些研究结果,向大家介绍这个系列的Wan2.2-I2V LoRA,我自己称为ConsistencyLoRA系列.这个系列的LoRA功能是通过输入图像,通过Wan2.2-I2V模型直接生成与输入图像高度一致性的视频.
"ClothConsistency" is the second model in this series. This model is designed to generate a video of a person wearing a specific piece of clothing by directly using a product-shot image of the garment (on a white background) and a text prompt (some examples are available for reference).
From my personal testing(strength:high0.7,low0.9,lightning low:1.0), ClothConsistency performs very well when using well-crafted prompts and the lightning-low LoRA for acceleration. For example, it successfully maintains the clothing consistency for the "skateboarding boy," accurately renders the patterns on Hanfu (traditional Chinese clothing), captures the light and shadow on a dancing girl's jacket, and reproduces the pattern on an LV jacket.
Regarding how to write effective prompts: the model was trained with clothing categories such as coat, skirt, shirt, jacket, dress, and pants. Therefore, specifying the type of garment in your prompt will yield better results,like. For instance, if it is a jacket, the prompt should be ”used the jacket in the first frame, generate a video of a model wearing the jacket”, and if the outfit consists of multiple pieces, it is recommended to label each item's type separately (e.g., used the sweater and trousers in the first frame, generate a video of a model wearing the sweater and trousers). This approach produces more stable and reliable effects. Another drawback**:** Some random seeds can result in an excessive number of preceding frames. The solution is to change the random seed or the Prompt.
ClothConsistency是该系列的第二个模型.该模型希望通过直接输入衣服的白底图,通过prompt(样例中有一些Prompt可以参考)直接生成人物穿着对应衣服的视频.从我个人测试的来看(推荐强度:high0.7,low0.9,lightning low:1.0).在写好prompt和使用lightning low lora加速的情况下,ClothConsistency的效果很不错,比如:滑板少年的衣服的一致性,汉服上的花纹,女生跳舞下衣物的光影,LV外套上的花纹等.关于怎么写Prompt,因为训练的时候加入了衣服的种类,coat,skirt,shirt,jacket,dress,pants等,所以prompt指定衣物的种类的话,可以获得更好的效果,比如:”used the jacket in the first frame, generate a video of a model wearing the jacket”.如果是一套的,建议分别标注上类型,比如”used the sweater and trousers in the first frame, generate a video of a model wearing the sweater and trousers”.这样的效果比较稳定.缺点:有些random seed会出现前置帧数过长的问题,需要换random seed或者换Prompt.
The goal of creating the ConsistencyLoRA series is to broaden the commercial application scenarios for I2V (Image-to-Video) models.ConsistencyLoRA was trained before the release of Wan Fun VACE and Wan Animate. Compared to them, its drawbacks are that the generated video includes preceding frames from the input image—which can be removed by frame trimming (I've uploaded a CutFrame.ipynb script for this)—and the output can sometimes be blurry.However, ConsistencyLoRA has several advantages: 1.Based on the Wan I2V workflow: This makes it simple, convenient, and accessible, with low VRAM requirements. It is also compatible with various other I2V-based LoRAs. 2.Prompt-based generation without video replacement: The ability to generate directly from prompts is a key advantage. It allows for the rapid creation of scenes using T2V (Text-to-Video) without needing a source video to swap content into. For example, with ClothConsistency, you can use prompts to generate models of different ethnicities, skin tones, and body types wearing the same outfit. Furthermore, because no video replacement is involved, the lighting and shadows appear more natural. 3.Lower training cost and task-specific stability: The training cost is relatively low, and it can be fine-tuned for specific consistency tasks. Because it is trained for a particular purpose, its stability on that task is higher than that of Wan Animate or VACE, as demonstrated in the CarConsistency scenario.
做ConsistencyLoRA系列的LoRA是希望拓宽I2V模型商业应用的场景.ConsistencyLoRA的训练在Wan Fun VACE和Wan Animate发布之前,相比Wan Fun VACE和Wan Animate,ConsistencyLoRA的缺点在于视频有输入图像的前置帧,可以通过帧剪切去除(我上传了CutFrame.ipynb的脚本可以直接去除),然后生成有时候会有模糊情况.而ConsistencyLoRA优点在于:1.因为是基于Wan I2V工作流,所以简单方便,显存门槛低,各种基于I2V的lora也适用.2.通过Prompt生成,不需要替换视频.Prompt生成的优势在于,可以通过Prompt将画面快速通过T2V实现,不需要进行替换,比如ClothConsistency,通过Prompt可以生成不同族裔,不同肤色,不同身材的模特穿着对应的衣服.且因为不需要替换视频,所以光影会更自然.3.训练成本相对较低,可以对特定的一致性任务进行训练.因为是基于特定任务进行的训练,所以在特定任务上的稳定性也会比Wan Animate和VACE高,比如CarConsistency场景.
I have handled the entire process independently, from the LoRA concept and dataset processing to training and hyperparameter tuning. Due to the VRAM limitations of a 24G 4090 GPU, I can currently only train with a [360, 360] latent space, so it is still in the prototype stage. If the results are not ideal, I ask for your understanding and feedback, and I will do my best to improve it.Thank you for reading this far. Commercial use of this model requires a license (I'm hoping to at least cover the electricity costs for training, lol). If you can support my experiments with computing power that has more VRAM (to attempt a larger latent space to solve the blurriness issue), or if you are interested in a commercial collaboration to train a LoRA for a specific product, please DM me in Civitai. Thank you so much. Donation: https://ko-fi.com/ghostshell
因为从LoRA概念,数据集处理,训练和超参调整,都由我一个独立完成.由于4090 24G的显存限制,现在还只能用[360,360]的latent进行训练,所以还是处于原型机阶段,如果效果不太理想,请多谅解和反馈,我争取改进.感谢您能看到这里,该模型商用需要授权(希望能把训练的电费平了,哭).如果您有更大显存的算力支持我做一些实验(更大的latent尝试去解决模糊问题),或者有商业合作去训练特定产品LoRA的意向,请联系我QQ:3387286448,感谢感谢,赞助: https://ko-fi.com/ghostshell
