WAN 2.2 4-Stage SVI Promptorama for Nice Long Videos
Details
Download Files
About this version
Model description
Bug #1: I knew I screwed up the latent switch...BYPASS the 'latent directory' node at the very top when not continuing- it will screw up the SVI embedding on a fresh run. Sorry, need to fix that. I think I deleted the switch because of a problem with a null, I'll use a bypass- the SVI specific prompt concat can be switched out of the first stage embedding stack as well, I can probably do both at once. Not sure why this is acting up all of a sudden... it's been working fine for a while. Go figure.
Ok, I've finally got this somewhat presentable for sharing. There are plenty of SVI workflows posted by now. Most of them better than mine, I'm sure. I really like SVI though, so the more the merrier. The main idea here is to automate the switching of the prompts and LoRAs, so you can just pick a preset scenario and crank it out. No typing, no selecting from drop-downs, no oh shit look what I did they're all backwards type scenarios. You should definitely be proficient with comfy and WAN before trying SVI- do not jump straight into this stuff if you're getting started. You will lose hair. It's still pretty messy, and it sure as hell isn't one of these magnificent beasts. My starting point was this, which is great and a much easier intro to SVI if that is what you need. But it's now my main WF, and I'm really pleased with the way it came out. Switching four prompts and four LoRAs independently is seriously annoying.
Here are the main features:
Each stage gets it's own set of prompts- you can pick from one of 16 sets (I have only completed 7 of them so far, the rest are currently empty). You only need to use TWO switches to do this. You pick one of the colors- brown, red, yellow or cyan. These are the big blocks with four sets in each of them. Then you pick a number- you don't have to worry about the indices, I have labeled them Top Right, Top Left, Bot Right, Bot Left so it should be clear enough to pick a set within a color group. Selection is done with the two Fast Group Muter nodes. I'm sure that that if I add any updates to this WF it will have more slots filled in. It is annoying to get a group all set and locked in, but once it's done, it's done.
You do NOT need to put image quality or SVI-specific motion prompts into the sets- I've already done that. It happens in the stages themselves. If you do need to change any of that, look at the two concats that take the GET nodes and go to the T5 encoder. Each stage has them. That's where you can put specific transition/picture quality stuff. So keep your prompts limited to actions and descriptions.
Here's the nice part. Setting your prompt also sets your LoRAs. So you only need to do it once for every set you make. Take a journey down into the subgraph to configure these. The groups are laid out as exact copies of the promptage, so just throw the syntax into the appropriate nodes. All of the sub-subgraphs were initially Manager loaders, but this caused huge lag, so they are strings now. I did put in loaders on the side so if you need previews and auto-complete, do it in there and copy your string to the set. I took a snapshot of filtered 2.2 LoRAs and popped it into a load image node- I suggest you do the same, as it is very helpful when setting up your groups if you've got a bunch of thumbnails of exactly what you have. Of course you don't have to use the LManager- the data is strings until it gets to the stages, that's where you would change loaders.
You can start with an image or a frame from a video with the INPUT radio button.
To continue from the last saved latent in your latent directory, click the USE LAST LATENT radio button. Obviously you need to have one saved first, default directory is 'latents' in your comfy output folder. I'm still working on this stage- it might be buggy- if the output starts from the original frame instead of the last frame, just grab the frame you want and put it in the anchor spot of the first SVI node and make sure the latent gets to the 'prev_latent' input. This spot is normally empty in a first-run. I'll get this straightened out if it's not working right, and I'm going to add an option to embed frames from an already-encoded video like you can with my SVI FLF workflow, so you can continue straight from a video (and get motion data instead of just grabbing a frame, which is the point of SVI).
The default model is /model/2053259?modelVersionId=2477539 I really like this one, great camera prompting. Then there is SVI PRO of course. If you want to use full models with lightning LoRAs, there are loaders in there for them. Disabled by default. Strength for lightning and SVI is set at the top left with all the loaders. Oh, don't forget the T5 encoder that works with wrapper. It's a naughty encoder.
I've got my own preferred resolutions pre-set with a switch and an aspect ratio flipper, you can ditch all this if it irks you. Resize input goes through Contrast Adaptive Sharpening. This is really important. I highly suggest you try it. I guarantee you that half of your generation faceplants are from garbage input. Ask me how I know. CAS won't fix garbage, but what it does fix is the weird blurring you get sometimes from resizing. I've put a comparison node in there so you can slide to the middle and have one hand free to slap your forehead.
Uhh what else is there... oh filenames, check the string nodes and concats, set to your preferred filing, prefix, suffix etc. If you want to save straight to video you can, but I'd advise saving frames. This is default- makes a unique folder (iterated int suffix to avoid saving in a previous folder, the number doesn't mean anything). A video is set to save at the end, but it is tagged at as a preview, with a high crf.
Unfortunately the demands of this setup necessitate the exclusion of an upscale and interpolation stage. You can add it, but it requires a lot of extra offloading and purges that can screw up your next generation. I've offloaded that stage to a different machine. Mac Studio with M2 Ultra actually handles it great, I was surprised. 4x upscale models can freak it out with huge batches, but for the most part it works great. But I digress. Point is, output is raw WAN.
Stages 1, 2, 3 have their own previews, the last preview is concatenated. The 1+2, 1+2+3, etc. previews are there but hidden and minimized in their stage groups. Too many previews.
Is that everything? Ah yes, bottom left prompt box has LoRA loaders next to it (will give you preview thumbs). That's there so you can do your experimental setups without going into the subgraph, play around, get it nailed down, then copy it to an open slot.
Obviously, by now, this is using WRAPPER nodes and LoRA Manager, 13 custom node packs in all. That's not a lot. Get all of them. If an idiot like me has a node, there is no reason at all why a genius like you does not. Sometimes you can get away with substituting core nodes, sometimes you can't. Some people actually know what they are doing. I hope to be one of those people when I grow up.
There are copious notes in the WF. Anything noteworthy has a note. Please do let me know if you use this and encounter bugs. I love being embarrassed. And it's hard to debug every permutation of every setting, so there are bound to be landmines in here.
Oh did I mention that this workflow is a monster? It will spit out 309 frames. I run comfy on a Cray X-MP with a neural net processor, a learning computer. But it gets so hot when I run this that I can't even sit on the seats. You have been warned: do not sit on the computer - you will get burned.
128GB of RAM will get you through safely. You'll probably have a few percent left over at the end, but unload what you can first. Definitely turn off defender, firewalls, and anti-virus stuff whenever using any of my workflows. They suck RAM like vacuums.
Try VRAM too. You need some of that, but not as much as the RAM. If you are one of those GGUF pussies, I can't help you. Go away.

