Wan 2.2 Video + Voice + Motion Control All-In-One workflow optimized for RTX 3060 12 GB VRAM GPU
Details
Download Files
About this version
Model description
Special thanks to:
@soulseeker for sharing his knowlage and giving the first crucial hints.
Features:
This workflow semi-automatic generates a "simple" video with audio. I designed it as an all-in-one solution. You just need a start image.
- Works perfect on RTX 3060 with 12 GB VRAM and 32 GB RAM + large swap file (min. 32 - 64 GB).
- Easy installation (all necessary models linked).
- Easy to use via switch options.
- High Quality outputs.
The workflow includes 4 simple steps:
1. Edge text-to-Speech to generate simple audio,
2. Generation of a motion control video for DWPose,
3. InfiniteTalk: generates the motion controlled and audio syncronised LQ video,
4. Upscaling and framrate multiplying for smoth HQ outputs.
Videos of around 5 seconds should work well. Maybe you can generate videos up to 8 or 10 seconds, but I have not tested it yet.
This workflow is in initial "alpha" status. Everything should work technically. So I believe it is a good basis for first and simple tests and hopfully some fun 🙂
But I`m pretty sure there is a lot to improve, for example:
- much better text-to-speech solution for better audio control, such as emphasis, speed, pauses, etc.
- improved motion and camera control, etc.
Attention:
This workflow is intended for more advanced comfyui users. Even installation and usage should be very simple, this workflow is just a basis for testing and developing and you might need some comfyui knowlage to use it. Please understand, I will not give installation and comfyui support here.
If you are a beginner with vido generation and more complex workflows, I would recommend you my other workflow for video generation. This one has been well tested and is allready much better documented and commented.
This workflow based on official templates and different allready published workflows. I just put different parts together, created a hopefully easy-to-use “design” and optimized everything for 12 GB VRAM.