Simple Wan Video I2V 720 on 8Gb+ VRAM
Details
Download Files
About this version
Model description
Create videos with wan2.1-i2v-14b-720p-q6_k even with 8GB VRAM (but 16+ is better).
Update! v3:
Test generations were with 576*1024 images. 89 frames in total, frame rate 22. 4 steps + interpolation x2 up to 44 frames per second.
8Gb VRAM: 20-25min
16Gb VRAM: 7-12min
32Gb VRAM: 1-5min
__________________________________________________________________________________________________
v1:
Test generations were with 512*512 images. 45 frames in total, frame rate 16. 25 steps + interpolation x2 up to 24 frames per second.
8Gb VRAM: 40-70min
16Gb VRAM: 15-25min
32Gb VRAM: 6-15min
__________________________________________________________________________________________________
! Important !
To make it work, you'll need:
Models:
Self-Forcing / CausVid / Accvid Lora (for v3 workflow)
Any CLIP vision (im use CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors)
Any Wan VAE (im use wan_2.1_vae.safetensors)
Nodes (just install the missing nodes in the manager):
How to use
Upload an image that will be animated. The size of the image is important because the width and height of the video are taken from the source.
In the WanImg2Vid node set the desired number of frames to be generated.
In the Video Combine node, set the desired number of frames per second
That's it. Start generating
You can also enable the Interpolate and extend frame group. This will improve video quality and
How it works for me
Generating video at 16 frames per second. I usually use 45-55 frames generation for this.
I get about 3-3.5 seconds of video (49fps/16f=3.06s). The generated frames are then sent for x2 interpolation. At the end I get a smooth 4-5 seconds video at 24 frames per second (orig49f*2=98f/24fps=4.08s got smoothness and 1 second of video)
x2 interpolation completes one frame between two original frames. If you want to increase the multiplier, it must be a multiple of 2. A multiplier of 4 will draw 2 frames between the two original frames.
I don't use multiplier 4 because there is a risk to get slowmo effect.
I've found the perfect setting for me.
For KSampler:
steps 22-35
cfg 4-5
sampler: uni_pc
sheduler: simple
denoise 0.95-1
I have RTX 5060 16Bb, 64Gb RAM, i5 24600KF. Stable average generation time of a 4 second video 1000-1300 seconds.
It all depends on the image size and steps in KSampler. I mostly work with 512*512 and 480*720
This is my first workflow that gives normal quality and speed. At least I'm happy with it.
Try it. I hope I could help someone
P.S. I will keep working on this workflow, let me know if anyone is interested in it.
