Ideogram 4 on Apple Silicon (Mac) — Regional bbox + Local LLM Workflow

詳細

モデル説明

Ideogram 4 on Apple Silicon (Mac) — Regional bbox + Local LLM Workflow

Run Ideogram 4 locally on a Mac (M-series, Apple Silicon) in ComfyUI. Draw boxes to place people and objects, type a short label in each, and a local uncensored LLM expands them into Ideogram's structured JSON before Ideogram 4 (GGUF) renders the scene. No NVIDIA GPU required.

Other Ideogram 4 workflows assume an NVIDIA GPU, or the stock GGUF loader which fails on Apple's MPS backend. This one is built for Mac and ships the patches that make it work.

What it does

1) BBOX EDITOR   draw a box per person/object, type a SHORT label in each
        |        (e.g. "redhead woman", "wine glass", "small dog")
        v
2) GEMMA         a local uncensored LLM expands every region + the whole image
        |        into rich JSON, keeping each box exactly where you placed it
        v
3) IDEOGRAM 4    renders each element in its region
  • Regional composition: position multiple subjects/objects precisely (box coordinates are preserved verbatim).

  • No JSON by hand: Ideogram 4 ignores plain text (it returns a grey "safety filter" box) and only behaves with a structured JSON prompt. The LLM writes that JSON for you from short labels.

  • Realistic, non-"AI-looking" results, plus Ideogram's strong text rendering (use a text box to render a sign / signature).

  • SFW by default. NSFW-capable (see below).

  • The architecture patch is platform-agnostic, so it also works on CUDA — but the focus is Apple Silicon.

SFW / NSFW

Out of the box, neutral labels give safe-for-work images. For NSFW: the local Gemma is uncensored, and the optional Vintage Beauties / mi55ionary LoRAs (on CivitAI) add explicit anatomy. Load one on the LoRA node and use explicit labels.

Requirements

  • Mac with Apple Silicon (M1/M2/M3/M4/M5), 32 GB unified memory or more recommended.

  • ComfyUI (manual install or the Desktop app).

  • ~30 GB free disk for the models.

Install — easy (script)

Download the attached package, then from a terminal:

cd ideogram4-mac-workflow
chmod +x install_mac.sh
./install_mac.sh                 # auto-detects ComfyUI, or:
./install_mac.sh /path/to/ComfyUI

It installs the three custom nodes, downloads every model to the right folder, and drops in the workflow + LLM prompts. Then restart ComfyUI and load Ideogram4_GGUF_Mac_Bbox_Gemma. (Optional NSFW LoRAs: export CIVITAI_TOKEN=... before running.)

Install — manual

1. Custom nodes (into ComfyUI/custom_nodes):

git clone https://github.com/fxd0h/ComfyUI-GGUF                # Ideogram-4 GGUF + an MPS dtype fix
git clone https://github.com/fxd0h/ComfyUI-LLM-text-processor  # runs the Gemma GGUF on Apple Silicon
git clone https://github.com/kijai/ComfyUI-KJNodes             # the bbox prompt builder

The first two are patched forks; the patches are also submitted upstream (city96/ComfyUI-GGUF PRs #455 and #456).

2. Models (download and place):

  • ideogram4-iQ4_NL.gguf + ideogram4_unconditional-iQ4_NL.ggufmodels/unet/ideogram4-gguf/ — from stduhpf/ideogram-4-gguf (Hugging Face)

  • qwen3vl_8b_fp8_scaled.safetensorsmodels/text_encoders/ — from Comfy-Org/Ideogram-4 (path text_encoders/)

  • flux2-vae.safetensorsmodels/vae/ — from Comfy-Org/Ideogram-4 (path vae/)

  • gemma-4-E4B-it-ultra-uncensored-heretic-Q4_K_M.ggufmodels/LLM/gemma4-e4b-uncensored/ — from llmfan46/gemma-4-E4B-it-ultra-uncensored-heretic-GGUF

  • LoRAs (optional, NSFW) → models/loras/ — from CivitAI (Vintage Beauties, mi55ionary)

3. LLM prompts — copy ideogram4_json.txt and ideogram4_expand.txt (in the package, prompts/) into models/LLM/prompts/.

4. Workflow — copy Ideogram4_GGUF_Mac_Bbox_Gemma.json into ComfyUI/user/default/workflows/.

Restart ComfyUI. The Gemma node auto-downloads a small macOS llama.cpp binary on first run.

How to use

  1. Bbox editor (node 1): on its canvas — drag to draw a box, double-click to edit (set its label and obj vs text), click to select, Del to remove, Ctrl/Cmd+C/V/D to copy/paste/duplicate. One box per subject/object with a short label.

  2. Gemma (node 2) expands every region + the whole image. Leave system_prompt = ideogram4_expand.txt.

  3. Image size / batch: set the SAME width/height in EmptyFlux2LatentImage and Ideogram4Scheduler; batch on the latent.

  4. CFG / seed / steps: DualModelGuider (~3) / RandomNoise / Ideogram4Scheduler.

  5. Signature / text: add a text box (e.g. your name) — Ideogram renders it cleanly.

Speed

Render-tier on Mac: ~7 min per 1024px image at 20 steps (GGUF dequant on MPS is the bottleneck), plus ~10–30 s for the Gemma step. To go faster: swap DualModelGuider for BasicGuider (single model, ~2x), use 12 steps, or lower the resolution. For maximum speed, the mflux CLI runs the same model in ~90 s.

Troubleshooting

  • "Unknown model architecture" on the GGUF node → you're on the stock node; use the patched fork fxd0h/ComfyUI-GGUF.

  • Gemma node errors about a binary → it downloads a macOS llama.cpp build on first run; allow it network access.

  • Grey image → your prompt reached Ideogram as plain text; keep the Gemma → CLIPTextEncode chain (JSON is what gets through).

Notes

  • Ideogram 4 has a non-commercial license; respect it.

  • Developed with AI assistance and verified on-device on Apple Silicon.

  • Forks / upstream: github.com/fxd0h/ComfyUI-GGUF (city96/ComfyUI-GGUF #454/#455/#456), github.com/fxd0h/ComfyUI-LLM-text-processor.

このモデルで生成された画像