ComfyUI F5 TTS Workflow | Text to Speech & Voice Cloning
详情
下载文件 (1)
关于此版本
模型描述
Turn text into rich, expressive voices with natural tone control.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
This workflow helps you turn text into expressive speech using advanced voice synthesis. It lets you clone voices from short audio samples and control timbre, tone, and pace for natural results. You can generate narrations, voiceovers, or character lines directly in ComfyUI. The interface is flexible and reproducible, making it ideal for creators and designers seeking precise voice control. Integrate it into larger node pipelines for fully custom audio generation and prototyping with complete local control.
Important nodes:
Key nodes in ComfyUI F5 TTS workflow
F5TTSAudio (#15)
The core single-pass TTS node used across the EN, FR, DE, JP, F5v1, and E2 groups. Supply your script and choose the model preset and vocoder that suit your language and delivery. If you want reproducible takes, keep the seed fixed; if you want variety, randomize between runs. The implementation is provided by the ComfyUI-F5-TTS extension. GitHub GitHub - FishAudio/F5-TTS
F5TTSAudioInputs (#44)
The cloning entry point that consumes a reference WAV and its matching transcript to build a speaker representation, then synthesizes new lines in that voice. Use a clean sample with consistent loudness and ensure the transcript is exact to maximize similarity and reduce artifacts. Switch model presets or vocoders here if you need a brighter or more neutral decode. GitHub - FishAudio/F5-TTS
Apply Whisper (#13)
Automatic transcription for your reference sample. Pick a Whisper size that balances speed and accuracy for your hardware and language, then feed its output text to the cloning node so the audio and text are perfectly aligned. This prevents conditioning errors that can happen when the sample text differs from what was actually spoken. GitHub
VrchAudioRecorderNode (#43)
An in-graph recorder that captures a short spoken prompt for cloning, removing the need for external tools. Hold to record, release to stop, and immediately hear how ComfyUI F5 TTS sounds in your own voice. Keep the mic close and reduce room noise for the cleanest result.
Notes
ComfyUI F5 TTS Workflow | Text to Speech & Voice Cloning — see RunComfy page for the latest node requirements.
