MMAudio Batch Soundifier

Details

Model description

Simple batch processing for adding sound to videos, naughty enabled, not mandatory. Uses the basic forloop that I use for all my batch workflows. Super simple, specify your source and target directories, click run as many times as you have number of videos in the folder. Or click until you get tired. Won't make a difference. It keeps track of what is done, and when it's out of naughty videos to work with.

MMAudio is trained with 24fps, there is a button to switch to higher framerates, which work fine, with a few small caveats which I've noted within. Actually, now I think on it, switch is probably not needed, given source/loaded options in VHS video info... The model cares about duration, and will decimate whatever it doesn't like on the image input side. So there really shouldn't be an issue unless you're giving it something below 24fps. But I use it like this, and it works.

I've done 12-15 second segments, but they don't turn out as good as shorter ones. 8-9 seconds is the best quality, IMO. Since I make mostly minutes-long compositions, generating for finished comps necessitates chopping them into segments first. Shutter Encoder's 'split' function works great for doing this really, really quickly. Way faster than using premiere. Also, if you do need to split a long video before processing, you can force 24fps output at the same time, which will speed up MMAudio (as opposed to running it with a 60fps video).

Keep prompts to a bare minimum- the nsfw model can fill in the blanks. Only add details as needed.

Filenames automatically transfer to new combine. I'm adding sound to a lot of finished 60fps stuff, so there are audio save nodes in there, since I don't need to save the video again.

Be careful when adding interpolation and scaling, parsimony and judicious use of VRAM purges may be needed when incorporating into larger workflows. It can get tricky if you're processing a lower framerate for audio but sending the output to an interpolated combine. When starting from scratch with the intent to add audio, I prefer to set a dedicated 24fps folder straight out of WAN decode, or after upscale (an upscale does help the audio model see what's going on a bit better). This way I avoid wasting time processing videos that are NG.

Here is a WF with interpolation included. (NSFW) Dead-Simple MMAudio + RIFE Interpolation Setup for WAN 2.2 I2V 14B

SeoulSeeker kindly brought to my attention how much MMAudio has improved since I last fooled around with it. As you know, this usually means a few weeks at most.

Images made by this model

No Images Found.