Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio
Details
Download Files (1)
About this version
Model description
Turn prompts into rich, realistic audio and music instantly.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning - you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises - run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout - useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON - the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense - you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations - First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
With this official audio generation setup, you can turn text prompts into expressive, high-quality music and ambient audio. It supports extended playback, smooth tonal transitions, and flexible sound layering. Great for sound designers, musicians, or developers experimenting with text-to-audio generation. The workflow uses T5Gemma and Qwen3.5 encoders to enhance prompt accuracy and output quality. Its reproducible structure ensures consistent creative results for professional audio projects.
Important nodes:
Key nodes in Comfyui Stable Audio 3.0 Medium Base workflow
ComfySwitchNode(#34). Toggles between the originaluser_inputand the Qwen-generated text. Turn it on for structured, length-matched rewrites or off for direct control.TextGenerate(#28). Runs Qwen3.5 with a category-specific system prompt to expand ideas. To customize the rewrite style, edit the category templates inJsonExtractString(#49) and the glue prompts in the adjacentText Replacenodes.EmptyLatentAudio(#11). Sets clip length. Keep this aligned with the insertedAUDIO_LENGTHtoken so the synthesis time matches the textual intent.KSampler(#3). Governs the denoising trajectory for Stable Audio 3. Adjustseedfor variations while keeping other settings stable to compare takes fairly.SaveAudioMP3(#19). Controls the output filename prefix and format for quick library building from multiple runs.
Notes
Stable Audio 3.0 Medium Base workflow in ComfyUI | Text-to-Audio - see RunComfy page for the latest node requirements.

