Wan2.2 V2V VACE One-Click 'Seamless' Workflow Loop, Preserving Subject
Details
Download Files
About this version
Model description
"The power of the jiggle physics in the palm of my hands"
-Doc Ock or something like that
This is a VACE V2V workflow in Wan2.2 designed to take a subject based on your reference image provided, and replace the subject in your reference video provided. This means the subject in the image does whatever action the subject in the video does.
V2: NOTES - IMPORTANT!!
I couldn't find a non-conflicting node that let one generate some sort of silence. Given the output limitations of what I could find/use, I ended up having to write my own custom node for the situation. The custom node is provided within the download for this workflow. Simply take the folder "Silence Generator" and move that folder and its contents to your "custom node" folder in ComfyUI. Restart ComfyUI and you should be good to go.
If you know of a node that can generate one second of silence, you can simply replace the custom node with that (and then tell me about it!)
I'll update here with some quirks along the way I've been seeing. Not really bugs per se but things to help resolve issues:
Problem: "LayerUtility: Purge VRAM V2 is not being found even though LayerStyle nodepack is up to date"
Solution: clone it directly into your custom_nodes directory from here: https://github.com/chflame163/ComfyUI_LayerStyle
Root Cause: ComfyUI registry possibly caching wrong versionProblem: "My VRAM is bad, and the models are too large for my machine. Slow/OOM"
Solution: Use a GGUF version. You may need different loaders. If you connect the model output to the set nodes input behind the loaders, it should work fine. If you're still confused and google isn't helping, let me know and I can give some guidance.
Root Cause: I created this on an H100 VM.Problem: "I'm getting a numpy error upon running the workflow from InspyrenetRembgAdvanced."
Solution: I've been seeing this time to time. I'm really not liking that node and may find a way to replace it soon. For now, if you run the workflow again, it ignores the error.
Root Cause: Node be jank.
Here's how it works in a nutshell:
1. You input your image reference, video reference, size, VACE models, iteration number, frames to process per iteration, overlap frames, and whatever other params
2. It runs edge detection, pose, plus some fancy masking (optional) on your video iteration slice. It runs processing on your image to pad it if it's not in line with your video aspect ratio.
3. It runs VACE processing on your video/ref image/etc..
4. It replaces the user specified overlap frames with grey frames on your VACE output.
IF IT'S THE 0th INDEX (first iteration), IT BATCHES AND GOES TO NEXT ITERATION, ELSE...
5. It processes the transition frames in a separate VACE workflow, using the black/white masking trick, but still in workflow (not separate).
6. The grey frames inserted earlier from step 4 get replaced with your processed frames in step 5.
7. The last frame from your iteration gets sent back to the beginning of the workflow, the subject in that is masked out, and your original ref image character is overlayed on top. (This is important to stop the typical cooked look each iteration and how it differs from the usual 'sending the last frame over for reference' like other workflows.)
8. Steps 2 through 6 repeat until your iteration total is hit.
9. It chops off the very first grey frames from your overall video.
Result?
You should now have a near seamless video with processed transitions from VACE. Your subject should not get messed up. The background may get a bit cooked for high iterations.
My intention on this workflow is that it's easy to operate, even though some of the math and conditional nodes everywhere may seem kinda crazy.
Disclaimer: I'm running this on an H100. Unless your gpu is genuine 100% angus beef power and not a potato pc, you will almost certainly want to change out the diffusion models/text encoders/etc.. to make things run faster. I tend to place quality first and speed secondary.
Future places this can go:
Canny incorporation...
...and/or Maintaining the mouth shape while removing the mask better - medium priority
Understanding the pose blend better so I can incorporate more pose without it coming up in the final video as an object. (maybe it's the color? blend type? need research) - medium priority
̶A̶u̶d̶i̶o̶ ̶-̶ ̶h̶i̶g̶h̶e̶s̶t̶ ̶p̶r̶i̶o̶r̶i̶t̶y̶,̶ ̶s̶h̶o̶u̶l̶d̶ ̶b̶e̶ ̶u̶p̶c̶o̶m̶i̶n̶g̶ ̶s̶o̶o̶n̶.̶ ̶f̶i̶g̶u̶r̶i̶n̶g̶ ̶o̶u̶t̶ ̶s̶o̶m̶e̶ ̶s̶y̶n̶c̶i̶n̶g̶ ̶s̶t̶u̶f̶f̶.̶ - done
Background options: letting user pick vod background, image background, or a new T2V style generated background - low prioirty
Keeping background consistency better from initial image to first generation (possibly need some masking on the control video) - medium priority
I̶n̶t̶e̶r̶p̶o̶l̶a̶t̶i̶o̶n̶ ̶s̶t̶e̶p̶,̶ ̶m̶a̶y̶b̶e̶ ̶u̶p̶s̶c̶a̶l̶e̶ ̶-̶ ̶m̶e̶d̶i̶u̶m̶ ̶p̶r̶i̶o̶r̶i̶t̶y̶ ̶b̶u̶t̶ ̶e̶a̶s̶y̶ ̶t̶o̶ ̶d̶o̶,̶ ̶I̶ ̶j̶u̶s̶t̶ ̶w̶a̶n̶t̶ ̶t̶o̶ ̶f̶i̶g̶u̶r̶e̶ ̶o̶u̶t̶ ̶t̶h̶e̶ ̶i̶d̶e̶a̶l̶ ̶w̶a̶y̶ ̶t̶o̶ ̶d̶o̶ ̶i̶t̶ ̶f̶o̶r̶ ̶q̶u̶a̶l̶i̶t̶y̶/̶s̶p̶e̶e̶d̶ - done
Adding an optional use for different image refences on user specified iterations. could add some cool possibilities. - low priority
Changing out the deprecated resize image v1 for v2 - medium priority, in progress
Bug fixes - medium priority depending on the bug
Personal notes:
This started out with a "there's a source video I like but I hate how renaissance masks look and want to replace the person." Then, I decided to loop the process. Then I thought "but what if I make it seamless" from the overlap, so I built out a full FL2V step in it, but that wasn't seamless. It had this coloration difference and jumps. Then, I saw some "seamless" workflows on CivitAI. Those were neat! ...but they used filepaths and stuff in essentially a separate workflow. I wanted to click one button and process a full video, so I continued building this out. It's still definitely not perfect. It doesn't give a 1 for 1 replacement exactly like I want, but it's pretty cool I think for what it does so far at least. From here on, it's mainly fine tuning on everything and then fixing edge cases + adding some more features.