Z-Image Turbo Qwen-VL Auto-Prompt: Anime to Real (Multi-Control Switch) | Z-Image Turbo Qwen-VL 智能反推 + 三重控制:二次元转真人工作流

詳細

モデル説明

💡 Workflow Logic / 核心思路: This workflow solves the two biggest problems in Anime-to-Realistic conversion: Prompting and Composition Consistency. 这个工作流解决了二次元转真人最大的两个痛点:提示词难写构图不一致

✨ Key Features / 亮点:

  1. 🧠 Powered by Qwen-VL (智能反推):

    • No need to manually type prompts! The workflow uses Qwen-VL to analyze your input image and generate detailed, native-language prompts automatically.

    • 不需要手动写提示词!利用 Qwen-VL 视觉大模型自动分析原图,生成精准的描述。

  2. 🎮 Switchable ControlNets (三重控制开关):

    • Includes Depth, HED (SoftEdge), and Pose ControlNets.

    • Switch System: You can easily toggle between them based on your needs (e.g., use HED for details, Depth for structure).

    • 内置 Depth、HED、Pose 三种ControlNet,带有一键切换开关,你可以根据图片类型灵活选择锁定的方式。

  3. 🔒 Composition Lock (构图锁定):

    • Preserves the character's pose, outfit outline, and background structure while transforming the style to realistic.

    • 完美锁定人物动作、服装轮廓和背景结构,只改变画风,几乎不改变内容。

📝 How to use / 使用方法:

  • Input: Upload your Anime/Illustration image.

  • Select Control: Choose which ControlNet (Depth/HED/Pose) fits your image best via the switch.

  • Run: Let Qwen-VL describe it and Z-Image generate it.

💡 Tips & Troubleshooting / 使用贴士与避坑指南

1. 最佳输入图像 (Best Input Source)

中文: 本工作流对人物结构清晰、遮挡较少的图片效果最佳。为了获得最精准的解剖结构和光影,建议输入干净的素体 (Base Body) 或衣着剪裁简单的图像。过于繁复的服装堆叠可能会干扰 ControlNet 对骨架的识别。

English: This workflow performs best on images with clear body structures and minimal occlusion. For the most accurate anatomy and lighting, it is highly recommended to use a Base Body (Clean Anatomy) or a character with simple attire as input. overly complex layered clothing might confuse the ControlNet skeleton detection.

2. 依然不够真人? (Still looking too 2D?)

中文: 由于 Turbo 模型的特性,有时第一遍直出可能还会残留二次元的“塑料感”。如果发生这种情况,建议将生成结果作为输入再跑一次 (Loopback)。第二遍重绘通常能彻底消除动漫痕迹。

English: Due to the nature of Turbo models, the first pass might sometimes retain a "plastic" or 2D look. If this happens, simply run the output image through the workflow again (Loopback). A second pass is usually enough to fully solidify the realistic texture.

3. 复杂画面处理 (Complex Scenes & Small Subjects)

中文: 如果原图中的人物占比过小,或者服装极其华丽复杂(如复杂的铠甲或蕾丝),AI 可能会“由于分辨率不足”而画糊。这种情况属于正常现象,建议多生成几张 (Reroll) 抽卡,或尝试提高输入分辨率。

English: If the character in the input image is too small, or wearing extremely intricate outfits (like complex armor or lace), the AI might struggle with details. This is normal. Please try generating a batch of images (Reroll) or increasing the resolution.

4. 人体比例预警 (Anatomy Warning)

中文: ControlNet 会忠实地还原原图的骨架比例。如果你的原图是极其夸张的二次元比例(如超大眼睛、火柴细腿),转成真人后可能会因为“恐怖谷效应”而显得吓人。建议使用人体比例正常的原图。

English: ControlNet faithfully preserves the original skeletal proportions. If your input has exaggerated anime proportions (e.g., massive eyes, stick-thin limbs), the realistic result might land in the "Uncanny Valley" and look unsettling. Input images with realistic anatomical proportions are recommended.

5. 必备模型 (Required LLM Models)

中文: 为了确保 Prompt 反推的精准度,请下载以下 Qwen-VL 模型并放入 ComfyUI/models/LLM 文件夹:

English: To ensure accurate Prompt generation, please download the following Qwen-VL models and place them in your ComfyUI/models/LLM folder:

Huihui-Qwen3-VL-4B-Instruct-abliterated-Q6_K.gguf

Huihui-Qwen3-VL-4B-Instruct-abliterated.mmproj-f16.gguf

6. ⚙️ 硬件需求与运行时间参考 (System Requirements & Performance)

测试环境 (Tested Specs):

RAM: 32GB

GPU: NVIDIA RTX 5060 Ti (16GB VRAM)

运行耗时 (Run Time):

全流程约 70 - 90 秒 (Total workflow takes approx. 70-90s).

このモデルで生成された画像