ProductConsistency-Wan2.2-I2V-ConsistencyLoRA3
Details
Download Files
Model description
The Samples in showcase are using both high and low with lightning-low lora.
IT IS A LORA,IT IS NOT A WORKFLOW.
Hello everyone, long time no see. I apologize for not releasing more models recently, as I've spent the last month researching innovative features for the Wan2.2-I2V model's LoRA.I've finally achieved some research results and would like to introduce you to this series of Wan2.2-I2V LoRA, which I personally call the ConsistencyLoRA series. The function of this LoRA series is to take an input image and directly generate a video that maintains a high degree of consistency with that image, using the Wan2.2-I2V model.
大家好,好久不见.由于最近一个月都在研究Wan2.2-I2V模型Lora的创新功能,没有发更多的模型,抱歉.最近终于有了一些研究结果,向大家介绍这个系列的Wan2.2-I2V LoRA,我自己称为ConsistencyLoRA系列.这个系列的LoRA功能是通过输入图像,通过Wan2.2-I2V模型直接生成与输入图像高度一致性的视频.
ProductConsistency is the third model in this series. The model is designed to generate product videos directly from a product image with a white background, using a text prompt.From my personal testing (recommended strength: high 0.9, low 0.9, lightning low: 1.0), the performance of ProductConsistency is quite good when using a well-written prompt and accelerating with the 'lightning low' LoRA.However, compared to the two Consistency models I previously released, CarConsistency and ClothConsistency, its ability to follow prompts for advertising scenarios is just average. This might be due to less training data in Wan2.2. A certain degree of Prompt Engineering is required to achieve better results. It's also worth noting that it's not recommended to write prompts about the product subject itself (as this can easily cause subject consistency issues). Instead, in the examples, I will specify the product's category (which helps the T5 model better understand the product's actual shape and generate more vivid results), for instance: "The product is a canned drink/a box of chocolate/a bottle of perfume..."Another thing is that because there is no human in the training dataset, so it is not sure for generating the product video with human.A Sample Prompt,for more,please check the info of the video: "Product Consistency. Used the product in the first frame, generate a commercial-quality video of the product. The product is a canned drink.The product is floating in the air in the dark with some neon green light shineing. Suddenly the product is grabed by a huge monster hand. The overall atomsphere is dark and cyberpunk."
ProductConsistency是该系列的第三个模型.该模型希望通过直接输入产品的白底图,通过prompt直接生成产品的视频.从我个人测试的来看(推荐强度:high0.9,low0.9,lightning low:1.0).在写好prompt和使用lightning low lora加速的情况下,ProductConsistency的效果很不错.但相较于CarConsistency和ClothConsistency我之前发布的两个Consistency模型,可能因为Wan2.2训练的数据较少,对于广告场景的Prompt的遵循程度比较一般,需要一定程度的PromptEnginering才能得出比较好的结果,还有值得注意的是,不建议写关于商品主体的Prompt(容易产生主体一致性问题),然后在样例中的,我会将产品的类型说明出来(使得T5更理解商品本身的形状,生成更加生动),比如The product is a canned drink/a box of chocolate/a bottle of perfume....因为训练数据中是纯产品视频(没有人的),所以不确定跟人一起的效果.一个样例的Prompt,更多的请看视频的info那里:"Product Consistency. Used the product in the first frame, generate a commercial-quality video of the product. The product is a canned drink.The product is floating in the air in the dark with some neon green light shineing. Suddenly the product is grabed by a huge monster hand. The overall atomsphere is dark and cyberpunk."
The goal of creating the ConsistencyLoRA series is to broaden the commercial application scenarios for I2V (Image-to-Video) models.ConsistencyLoRA was trained before the release of Wan Fun VACE and Wan Animate. Compared to them, its drawbacks are that the generated video includes preceding frames from the input image—which can be removed by frame trimming (I've uploaded a CutFrame.ipynb script for this)—and the output can sometimes be blurry.However, ConsistencyLoRA has several advantages: 1.Based on the Wan I2V workflow: This makes it simple, convenient, and accessible, with low VRAM requirements. It is also compatible with various other I2V-based LoRAs. 2.Prompt-based generation without video replacement: The ability to generate directly from prompts is a key advantage. It allows for the rapid creation of scenes using T2V (Text-to-Video) without needing a source video to swap content into. For example, with ClothConsistency, you can use prompts to generate models of different ethnicities, skin tones, and body types wearing the same outfit. Furthermore, because no video replacement is involved, the lighting and shadows appear more natural. 3.Lower training cost and task-specific stability: The training cost is relatively low, and it can be fine-tuned for specific consistency tasks. Because it is trained for a particular purpose, its stability on that task is higher than that of Wan Animate or VACE, as demonstrated in the CarConsistency scenario.
做ConsistencyLoRA系列的LoRA是希望拓宽I2V模型商业应用的场景.ConsistencyLoRA的训练在Wan Fun VACE和Wan Animate发布之前,相比Wan Fun VACE和Wan Animate,ConsistencyLoRA的缺点在于视频有输入图像的前置帧,可以通过帧剪切去除(我上传了CutFrame.ipynb的脚本可以直接去除),然后生成有时候会有模糊情况.而ConsistencyLoRA优点在于:1.因为是基于Wan I2V工作流,所以简单方便,显存门槛低,各种基于I2V的lora也适用.2.通过Prompt生成,不需要替换视频.Prompt生成的优势在于,可以通过Prompt将画面快速通过T2V实现,不需要进行替换,比如ClothConsistency,通过Prompt可以生成不同族裔,不同肤色,不同身材的模特穿着对应的衣服.且因为不需要替换视频,所以光影会更自然.3.训练成本相对较低,可以对特定的一致性任务进行训练.因为是基于特定任务进行的训练,所以在特定任务上的稳定性也会比Wan Animate和VACE高,比如CarConsistency场景.
I have handled the entire process independently, from the LoRA concept and dataset processing to training and hyperparameter tuning. Due to the VRAM limitations of a 24G 4090 GPU, I can currently only train with a [360, 360] latent space, so it is still in the prototype stage. If the results are not ideal, I ask for your understanding and feedback, and I will do my best to improve it.Thank you for reading this far. Commercial use of this model requires a license (I'm hoping to at least cover the electricity costs for training, lol). If you can support my experiments with computing power that has more VRAM (to attempt a larger latent space to solve the blurriness issue), or if you are interested in a commercial collaboration to train a LoRA for a specific product, please DM me in Civitai. Thank you so much. Donation: https://ko-fi.com/ghostshell
因为从LoRA概念,数据集处理,训练和超参调整,都由我一个独立完成.由于4090 24G的显存限制,现在还只能用[360,360]的latent进行训练,所以还是处于原型机阶段,如果效果不太理想,请多谅解和反馈,我争取改进.感谢您能看到这里,该模型商用需要授权(希望能把训练的电费平了,哭).如果您有更大显存的算力支持我做一些实验(更大的latent尝试去解决模糊问题),或者有商业合作去训练特定产品LoRA的意向,请联系我QQ:3387286448,感谢感谢,赞助: https://ko-fi.com/ghostshell
