Stable Audio 3 Sound Asset Generation Workflow

Watch the full video first if you want to understand how this Stable Audio 3 workflow works in practice. The video shows how a simple text idea can be expanded into a structured audio prompt, how different sound categories affect the result, and how to launch the workflow online without building the full ComfyUI audio environment locally.

This ComfyUI workflow is designed for Stable Audio 3 sound asset generation. Its main purpose is to turn text descriptions into usable audio assets, including music tracks, instrument loops, sound effects, one-shot samples, ambience, cinematic hits, UI sounds, game audio elements, and production-ready creative sound material. Instead of only generating random audio from a short prompt, this workflow adds a category-aware prompt expansion layer so the final Stable Audio prompt becomes more precise and more suitable for the target audio type.

The workflow is built around the Stable Audio 3 Medium Base route. It uses stable_audio_3_medium_base.safetensors as the main checkpoint, t5gemma_b_b_ul2.safetensors as the Stable Audio text encoder, and a separate text-generation route for intelligent prompt rewriting. The generation pipeline uses CLIPTextEncode for positive and negative conditioning, EmptyLatentAudio for defining the target audio duration, KSampler for latent audio sampling, and VAEDecodeAudio to decode the generated latent into an actual audio waveform.

The most important design in this workflow is the optional reprompt system. Users can input a short idea, then decide whether to enable prompt expansion. When reprompt is enabled, the workflow uses a category-aware prompt template. The available categories include Music, Instrument, SFX, and One-shot. Each category has different prompt rules. Music prompts focus on genre, instruments, layers, rhythm, mood, BPM, and track length. Instrument prompts focus on playing technique, timbre, production texture, BPM, and loop or stem length. SFX prompts focus on sound source, material texture, spatial environment, movement, attack, decay, and duration. One-shot prompts focus on short isolated audio samples such as hits, stabs, plucks, drum sounds, impacts, or short sound design elements.

This structure makes the workflow more practical than a simple text-to-audio setup. In ordinary audio generation, a vague prompt like “dark cinematic sound” may produce inconsistent results. Here, the same idea can be expanded into a more technical and production-oriented prompt, including instrument details, ambience, rhythm, physical texture, stereo space, and length. That gives the Stable Audio model clearer instructions and makes the output easier to use in real projects.

This workflow is suitable for video creators, AI filmmakers, game developers, music producers, sound designers, short drama editors, YouTube creators, Bilibili creators, RunningHub users, and Civitai workflow collectors. It can be used to create background music, transition sounds, UI feedback, horror stingers, cinematic impacts, ambience beds, Foley-style effects, instrument loops, and short production samples.

Compared with ordinary audio workflows, this version is more structured, easier to control, and better for repeated asset production. You do not need to manually write a professional audio prompt every time. You can start with a rough idea, choose the audio category, set the duration, control the seed, and let the workflow generate a more useful Stable Audio prompt before sampling.

Main features:

Stable Audio 3 sound asset generation workflow
Text-to-audio generation inside ComfyUI
Stable Audio 3 Medium Base checkpoint route
T5Gemma Stable Audio text encoder
Optional intelligent reprompt system
Music / Instrument / SFX / One-shot category presets
User input replacement and audio length insertion
EmptyLatentAudio duration control
KSampler latent audio generation
VAEDecodeAudio waveform decoding
Suitable for music, ambience, SFX, loops, and one-shot samples
Online RunningHub execution without local setup

Suggested workflow:

Start with a short audio idea first. Choose the category that matches your target output: Music for full tracks, Instrument for loops or stems, SFX for environmental or action sounds, and One-shot for short isolated samples. Set the duration according to the asset type. Enable reprompt if you want the workflow to expand your rough idea into a more detailed technical audio prompt. If you already wrote a strong prompt yourself, disable reprompt and send the text directly into Stable Audio. Run a first seed test, listen carefully, then adjust category, duration, BPM language, instrument description, texture, or spatial details until the generated asset matches your production need.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2058938930971635714?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1gLGo6kEbF/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2058938930971635714?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1gLGo6kEbF/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Model Type	Workflows
Base Model	LTXV 2.3
Published	2026-05-26

Stable Audio 3 Sound Asset Generation Workflow

Details

Download Files (1)

Model description

Images made by this model