The Snap Machine
Details
Download Files
Model description
The Snap Machine is a fully self-contained ComfyUI workflow that generates an image and a fitting social media caption. It first generates the image, then analyzes it with BLIP, and finally uses an LLM to write a caption based on the contents of the image.
How to Use The Snap Machine
1️⃣ Generate an Image – First, create an image with The Snap Machine disabled to get a clean base output.
2️⃣ Lock the Seed – Once you have an image you like, lock the seed so you can keep using that exact image.
3️⃣ Generate a Caption – Enable The Snap Machine to let BLIP analyze the image and the LLM refine it into a more natural, engaging caption based on a custom prompt. Keep generating to explore different options. Some fine-tuning may be required if captions get cut off or truncated, adjusting token limits or tweaking the LLM settings can help refine the output.
4️⃣ Adjust the Position – If the caption ends up over the face or in a bad spot, lock the Snap Machine seed and keep generating. This will randomly place the caption in different positions until you find one that works.
How It Works
The Snap Machine works by guiding the LLM with a pre-prompt that tells it how to use BLIP’s output to generate the final caption.
1️⃣ BLIP analyzes the image and creates a basic description of what’s in it.
2️⃣ The pre-prompt sets the style and tone for the LLM, telling it how to rewrite the BLIP output into a natural caption.
3️⃣ Both the BLIP description and the pre-prompt are sent into the LLM node, which refines them into a final caption.
You can adjust three key areas to fine-tune results:
The pre-prompt (to change how the LLM uses BLIP’s output)
BLIP’s settings (to control how it describes the image)
LLM parameters (to tweak length, randomness, and phrasing)
This gives you full control over how the caption feels and sounds, letting you customize it for different styles.
Resources
Setting up an LLM in comfy:
The LLM I use is the Toppy-M-7B.q4_k_s from: https://huggingface.co/TheBloke/Toppy-M-7B-GGUF/tree/main
Notes
The positive prompt section is built using three nodes, allowing me to use a wildcard processor in the middle. This setup helps introduce controlled variations while keeping the structure of the prompt flexible and dynamic.





