Video Media Toolkit: Streamline Downloads, Frame Extraction, Audio Separation & AI Upscaling for Stable Diffusion Workflows | Utility Tool v6.0

Details

Model description

Video Media Toolkit: Streamline Downloads, Frame Extraction, Audio Separation & AI Upscaling for Stable Diffusion Workflows | Utility Tool v6.0

Overview

Elevate your AI art pipeline with Video Media Toolkit v6, a free, open-source desktop utility designed for Stable Diffusion creators, trainers, and video-to-image enthusiasts. This all-in-one Windows app handles media ingestion, breakdown, enhancement, and reassembly—perfect for sourcing high-quality frames from YouTube/Reddit videos for LoRA training, isolating vocals/instruments for audio-reactive generations, or upscaling low-res assets to feed into ComfyUI or Automatic1111 workflows.

Whether you're prepping datasets for Flux/Stable Diffusion fine-tuning or crafting dynamic video inputs for AnimateDiff extensions, this tool saves hours by automating tedious tasks with yt-dlp, FFmpeg, Demucs, and Real-ESRGAN under the hood. GPU acceleration supported for blazing-fast processing on NVIDIA setups.

Key Benefits:

  • Batch Download & Queue: Pull videos/audio from URLs or local files, output as MP4/MP3 or frame sequences (JPG/PNG) ready for dataset prep.

  • AI-Powered Breakdown: Extract clean audio stems (vocals, drums, etc.) or frames for training—ideal for NSFW/SFW content curation.

  • Enhance & Rebuild: Denoise, sharpen, upscale (2x-4x), and reassemble with stabilization for polished video outputs.

  • Workflow Integration: Exports compatible with A1111, ComfyUI, Kohya_ss, or Hugging Face datasets. No more manual FFmpeg scripting!

Tested on Windows 10/11; Python 3.8+ required. ~500MB install size (includes torch with CUDA fallback).

Features

Download Tab: Source & Extract Media

  • Input: URLs (YouTube, Reddit media, direct links) or local files.

  • Outputs: MP4 (enhanced video), MP3 (audio), or frame folders (e.g., frame_0001.png for SD training).

  • Enhancements: Resolution (360p-8K), CRF quality, FPS control, sharpen/color correct/deinterlace/denoise.

  • Audio Options: Noise reduction, volume norm—great for clean stems.

  • Queue System: Add multiple jobs, sequential processing, auto-delete sources, custom yt-dlp/FFmpeg args.

  • Pro Tip: Extract 1000+ frames from a 5-min video in seconds; auto-handles Reddit wrappers.

Reassemble Tab: Rebuild Videos from Frames

  • Input: Frame folder (e.g., from Download or external edits).

  • Options: Set FPS, merge audio, apply minterpolate (motion smoothing), tmix (frame blending), deshake, deflicker.

  • Output: MP4 with custom FFmpeg filters—export stabilized clips for AnimateDiff or video LoRAs.

  • Use Case: Upscale frames → Reassemble into 4K training videos.

Audio Tab: Demucs-Powered Stem Separation

  • Input: MP3/WAV/FLAC from downloads.

  • Models: htdemucs, mdx_extra, etc. (GPU/CPU modes).

  • Outputs: Isolated tracks (vocals, bass, drums) to subfolders—feed into audio-conditioned SD prompts.

  • Modes: Full 6-stem or two-stem (vocals + instrumental) for quick remixing.

Upscale Tab: Real-ESRGAN Frame Enhancement

  • Input: Image folder (e.g., extracted frames).

  • Scale: 2x/3x/4x for SD-ready high-res assets.

  • Output: Batch-upscaled folder—boost low-res videos to 4K for better model training.

  • GPU Boost: Torch-based; falls back to CPU.

Additional Utilities:

  • Persistent output root folder selection.

  • Real-time logs + file export (logs/ dir).

  • Dependency tester (FFmpeg, yt-dlp, Demucs).

  • High-contrast dark UI for long sessions.

Installation & Setup

  1. Download: Grab the ZIP from GitHub Repo (or attach here).

  2. Run Installer: Double-click video_media_installer.bat—auto-installs PySide6, torch (CUDA if detected), Demucs, Real-ESRGAN, etc. Handles pip upgrades.

    • Manual Fixes: If [WARNING] for FFmpeg/yt-dlp, download from ffmpeg.org / yt-dlp GitHub and add to PATH or hardcoded paths.
  3. Model Download: Place RealESRGAN_x4plus.pth in /models/ for upscaling (link in README).

  4. Launch: Double-click launch_video_toolkit_v6.bat. Sets output folder on first run.

  5. Test: Use "Test Dependencies" button—aim for all [OK].

Compatibility Notes:

  • Windows Focus: Bat launchers for easy setup; Linux/macOS via manual Python run.

  • SD Integration: Frames export as numbered sequences (e.g., %04d.png) for direct import into Kohya or DreamBooth.

  • No A1111 Extension: Standalone app—pair with ControlNet for video-to-image pipelines.

  • Warnings: Large files may need 8GB+ RAM; GPU recommended for Demucs (else CPU is slow). NSFW content handled per source policies.

Usage Examples

  • LoRA Training Prep: Download anime clip → Extract PNG frames → Upscale 4x → Use in Kohya_ss dataset.

  • Audio-Reactive Art: Separate song vocals → Generate SD images with "vocal waveform" prompts.

  • Video Dataset: Batch-download 50 YouTube vids → Frames + stems → Train Flux on motion data.

Changelog (v6 Highlights)

  • Enhanced Reddit URL parsing.

  • Queue improvements + custom args.

  • Dark theme with better readability.

  • Bug fixes for Demucs GPU detection.

Images made by this model

No Images Found.