MMAudio Batch Soundifier

v1.1 contains audio tools. Easiest way to nail down a correction factor is to be consistent with your video output settings. Compare the exact length of your video to the exact length of the audio output (audio by itself, .mp3, flac, whatever, NOT combined). Use these values to generate a correction factor to use in the rate stretcher tool. Fading the ends out gets rid of pops at the beginning/end. Notes on all of this in the WF.

v1.2 Is for long source videos. I didn't realize MBM ignores all but one batch. I built my own batcher. It splits your video up into chunks - you set the length of the chunks with a node. It defaults to 15, that's a good max for MMAudio. The amount of frames per chunk is calculated from the framerate, so you don't have to worry about that. There are notes detailing how to extend the tree if you find yourself needing more than four chunks. There may be some more custom nodes in there than there are in v1, not sure. Worked great for me in testing. Really should have thought about that before, given that I always put out really long videos. Well, mine get their audio right in the generation WF for the most part, and I set this up to go over old stuff that was already segmented. Whatever, anyway, it's done. Please let me know if there are bugs.

Simple batch processing for adding sound to videos, naughty enabled, not mandatory. Uses the basic forloop that I use for all my batch workflows. Super simple, specify your source and target directories, click run as many times as you have number of videos in the folder. Or click until you get tired. Won't make a difference. It keeps track of what is done, and when it's out of naughty videos to work with.

MMAudio is trained with 24fps, there is a button to switch to higher framerates, which work fine, with a few small caveats which I've noted within. Actually, now I think on it, switch is probably not needed, given source/loaded options in VHS video info... The model cares about duration, and will decimate whatever it doesn't like on the image input side. So there really shouldn't be an issue unless you're giving it something below 24fps. But I use it like this, and it works.

I've done 12-15 second segments, but they don't turn out as good as shorter ones. 8-9 seconds is the best quality, IMO. Since I make mostly minutes-long compositions, generating for finished comps necessitates chopping them into segments first. Shutter Encoder's 'split' function works great for doing this really, really quickly. Way faster than using premiere. Also, if you do need to split a long video before processing, you can force 24fps output at the same time, which will speed up MMAudio (as opposed to running it with a 60fps video).

Keep prompts to a bare minimum- the nsfw model can fill in the blanks. Only add details as needed.

Filenames automatically transfer to new combine. I'm adding sound to a lot of finished 60fps stuff, so there are audio save nodes in there, since I don't need to save the video again.

Be careful when adding interpolation and scaling, parsimony and judicious use of VRAM purges may be needed when incorporating into larger workflows. It can get tricky if you're processing a lower framerate for audio but sending the output to an interpolated combine. When starting from scratch with the intent to add audio, I prefer to set a dedicated 24fps folder straight out of WAN decode, or after upscale (an upscale does help the audio model see what's going on a bit better). This way I avoid wasting time processing videos that are NG.

Here is a WF with interpolation included. (NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B

SeoulSeeker kindly brought to my attention how much MMAudio has improved since I last fooled around with it. As you know, this usually means a few weeks at most.

Model Type	Workflows
Base Model	SD 1.5
Published	2025-12-11

MMAudio Batch Soundifier

Details

Download Files (1)

About this version

Model description

Images made by this model