(NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B
Details
Download Files
About this version
Model description
Changelog
Version 1.0.1: RIFE Group output was set to 8fps by accident. Changed it to 24fps
Version 1.0: Initial release
A TRIBUTE TO GOONERS EVERYWHERE
Your WAN 2.2 video is great. It looks awesome. But where's the sound? We moved from images to videos, and WAN 2.2 is incredible for video. The missing piece...AUDIO!
This is my first article ever, so I'm sorry if I made any mistakes. Please leave a comment if I've made an error or if you need any help. For your reference, I'm running:
ComfyUI 0.3.68
Torch 2.9
CUDA 13
Python 3.13.9
Sage Attention 2.2
NVIDIA 5070 Ti (16gb vram)
And here are the custom nodes (3 in total):
ComfyUI-VideoHelperSuite 1.7.7 (https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite)
ComfyUI-MMAudio Nightly (https://github.com/kijai/ComfyUI-MMAudio)
ComfyUI-VFI Unknown (https://github.com/GACLove/ComfyUI-VFI)
- I think there's a more popular RIFE custom node that a lot of other people use, but Icouldn't figure out how to get fractional multiples for interpolation (16 -> 24fps is a 1.5x interpolation), but this node allows it.
Onto the workflow...
------------------------------------
This workflow handles two jobs:
Fix WAN 2.2’s native 16fps output by interpolating it to 24fps with RIFE.
Generate synced audio with MMAudio using the final 24fps video.
The setup is plug-and-play. Drop in your WAN video → interpolate → feed it into MMAudio → get synced output. The included notes explain the reasoning for FPS, step settings, and seed behavior.
What this workflow covers:
RIFE interpolation from 16 → 24 fps.
MMAudio sampler with recommended settings (50 steps, cfg 4.5).
Automatic audio + video combine at 24fps.
Optional re-interpolation afterward if you want 30fps+ output.
- You can plug your finished 24fps video into the 'Step 1: Rife Interpolation' group and just change the 'source_fps' to 24 and the 'target_fps' to 30.
Required MMAudio files
Download all of these into:
ComfyUI/models/mmaudio
MMAudio NSFW Model (fine-tuned off the base model)
MMAudio VAE (fp16)
MMAudio Synchformer (fp16)
https://huggingface.co/Kijai/MMAudio_safetensors/resolve/main/mmaudio_synchformer_fp16.safetensors
MMAudio CLIP Encoder (fp16)
Bonus
Once you've created a good MMAudio track, there are some further steps you can take depending on what you'd like to create.
1. Import your audio/video into some type of software (CapCut/Shotcut) and layer on some music in the background. I've done this with a few of my videos. I added a 'radio' filter to make it seem like the music was kinda tinny and playing in the background.
2. Layer other audio tracks alongside the NSFW audio track. You can see KaptainSisay very elegantly did something like that here (https://civitai.com/images/110700679)
