Qwen3-TTS Ultimate Pack (Cloning + Design + Low VRAM)
详情
下载文件
关于此版本
模型描述
# 🚨 UPDATE V1.5 (Jan 24, 2026) - CRITICAL FIX
Please update to this version immediately!
The previous version (v1.0) may crash due to a recent "Breaking Change" in the ComfyUI-Qwen3-TTS custom nodes.
✅ Fixes in v1.5:
* Fixed Crash: Solved the Unsupported speakers: fixed error.
* Plug & Play: Removed all personal file paths (no more "File Not Found" errors on first run).
* Potato Mode Guide: Added a visual guide inside the workflow for 0.6B model switching.
---
# 🎧 Qwen3-TTS Ultimate Pack (Voice Design & Cloning)
This is a beginner-friendly workflow for the newly released Qwen3-TTS model. It is optimized to run on consumer hardware with as little as 6GB VRAM (tested and working perfectly on a GTX 1060).
💡 POTATO PC MODE (<4GB VRAM):
If you are crashing, change the repo_id in the loader to: Qwen/Qwen3-TTS-12Hz-0.6B-Base (It is faster, uses half the memory, but has slightly less emotion).
I created this because the new nodes can be confusing for beginners. This download includes two separate groups in one workflow managed by a "Fast Switcher."
## 🚀 What's Included?
### Workflow 1: Voice Design (Text-to-Speech)
* Best for: Narrators, Movie Trailers, Assistant Voices.
* Uses the VoiceDesign model for high-quality, directed acting.
* Includes the "Instruct" field setup so you can direct the emotion (e.g., "Sad whisper", "Angry shout").
### Workflow 2: Voice Cloning (Audio-to-Speech)
* Best for: Cloning specific voices (yourself, friends, characters).
* Uses the Base model + Reference Audio.
* Pro Tip: I've set it up to accept ref_text which improves accuracy significantly.
## ⚙️ Requirements
1. ComfyUI Manager installed.
2. Qwen3 Nodes: You need ComfyUI-Qwen3-TTS (Author: DarioFT / ID: 3172 in Manager).
3. Utility Nodes: You need rgthree-comfy (via Manager) for the mode switcher to work.
* (Note: If you don't want to install rgthree, you can just bypass the groups manually using Ctrl+M).
## 📝 How to Use (New Easy Mode)
I have cleaned up the workflow into two distinct Color-Coded Groups. You don't need to wire anything manually!
The Control Switch: Look for the "Fast Groups Bypasser" node on the left.
1. For Text-to-Speech: Set Enable Voice Design to "yes" and Cloning to "no".
2. For Cloning: Set Enable Voice Cloning to "yes" and Design to "no".
* Note: Only enable ONE at a time to save VRAM on your GTX 1060.
Visual Guide:
🟦 *Pale Blue Group (Top)** = Voice Design.
🟦 *Cyan Group (Bottom)** = Voice Cloning.
* Visual Cue: If the nodes inside a group turn darker/muted, it means that group is Bypassed (OFF).
## 💡 Performance Note
* VRAM Usage: ~3.5GB to 5GB (depending on model choice).
* Speed: Fast generation even on older cards (GTX 10xx series).
Enjoy making your AI speak! And please Thumb Up if this saved your day! ⭐


