Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio

Turn prompts into rich, realistic audio and music instantly.

Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.

Open preloaded workflow on RunComfy

Open preloaded workflow on RunComfy (browser)

Why RunComfy first
- Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.

When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.

How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.

Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.

Overview

With this official audio generation setup, you can turn text prompts into expressive, high-quality music and ambient audio. It supports extended playback, smooth tonal transitions, and flexible sound layering. Great for sound designers, musicians, or developers experimenting with text-to-audio generation. The workflow uses T5Gemma and Qwen3.5 encoders to enhance prompt accuracy and output quality. Its reproducible structure ensures consistent creative results for professional audio projects.

Important nodes:

Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow

ComfySwitchNode (#34). Toggles between the original user_input and the Qwen-generated text. Turn it on for structured, length-matched rewrites or off for direct control.
TextGenerate (#28). Runs Qwen3.5 with a category-specific system prompt to expand ideas. To customize the rewrite style, edit the category templates in JsonExtractString (#49) and the glue prompts in the adjacent Text Replace nodes.
EmptyLatentAudio (#11). Sets clip length. Keep this aligned with the inserted AUDIO_LENGTH token so the synthesis time matches the textual intent.
KSampler (#3). Governs the denoising trajectory for Stable Audio 3. Adjust seed for variations while keeping other settings stable to compare takes fairly.
SaveAudioMP3 (#19). Controls the output filename prefix and format for quick library building from multiple runs.

Notes

Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio - see RunComfy page for the latest node requirements.

Model Type	Workflows
Base Model	Other
Published	2026-06-05

Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio

Details

Download Files (1)

About this version

Model description

Open preloaded workflow on RunComfy

Overview

Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow

Notes

Images made by this model