Stable Audio Open 1.0 in ComfyUI | Text-to-Music Workflow

詳細

ファイルをダウンロード (1)

モデル説明

Turns text prompts into cinematic music seamlessly and fast.

Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.

Open preloaded workflow on RunComfy

Open preloaded workflow on RunComfy (browser)

Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.

When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.

How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.

Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.


Overview

Generate expressive soundscapes and musical compositions from written prompts using this text-to-music workflow. Built on the advanced audio diffusion model, it provides full control over duration, tone, and emotion. Perfect for designers and creators seeking cinematic or ambient sound outputs. It encodes text with precision and processes it into realistic, listenable audio. Get consistent quality and flexibility for any creative theme or mood.

Important nodes:

Key nodes in Comfyui Stable Audio workflow

  • CLIPTextEncode (#6)
    This node encodes your positive prompt into conditioning that Stable Audio follows. Prioritize clear instrument lists, genre, mood, tempo or BPM, and production terms like “warm,” “lo-fi,” “cinematic,” or “ambient.” Subtle wording changes can meaningfully shift the composition. See ComfyUI core nodes for general behavior. ComfyUI

  • CLIPTextEncode (#7)
    The negative prompt helps avoid unwanted timbres or mix issues. Add terms that describe what to remove, for example “screechy, metallic ringing, glitch pops, radio hiss.” Keeping this concise often yields cleaner Stable Audio renders. ComfyUI

  • EmptyLatentAudio (#11)
    Controls the clip duration in seconds and optionally the batch count for multiple variations. Increase seconds for longer pieces, noting that computation scales with length. Use batch generation to audition several Stable Audio takes from a single prompt. ComfyUI

  • KSampler (#3)
    Drives the diffusion process for audio latents. The most influential controls are steps, sampler, cfg, and seed. Raise steps for more refined detail, adjust cfg to balance prompt adherence with creativity, and set a fixed seed to reproduce a take or vary it for new ideas. Refer to ComfyUI’s sampler notes for general guidance. ComfyUI

  • SaveAudioMP3 (#19)
    Exports the final waveform to an MP3. Use the filename_prefix to label versions and keep iterations tidy. When comparing prompts or seeds, saving multiple takes side by side makes Stable Audio selection faster. ComfyUI

Notes

Stable Audio Open 1.0 in ComfyUI | Text-to-Music Workflow — see RunComfy page for the latest node requirements.

このモデルで生成された画像