Gemini, prompt generator

Using a custom LLM prompt, analyze the image and output the structure as a prompt suitable for the i2v model.

+While it can also be used in Hunyuan, it is recommended to exclude prompts related to camera motion.

A Gemini API key is required. (FREE, LINK)

Also, enter your API key into the JSON file located at ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-ollamagemini\config.json

25.05.15 - As the free tier for the Gemini Pro version has become unavailable, you are now required to use only the Flash version(2.0 flash or 2.5 flash).

25.05.26 - Currently, the latest version of gemini flash is gemini-2.5-flash-preview-05-20.

[change logs]

25.08.23/Gemini I2V Prompt for Wan2.2 V2 (txt file)
I modified one of the jailbreak prompts for i2v. The tests were based on images of fully nude females of the Gemini 2.5 Pro and 2.5 Flash models. However, if you enter text in English, it is likely to be censored. -> Make a request for a text in a language other than English.

~~25.07.30/Gemini I2V Prompt for Wan2.2 V1 (txt file)~~ [Prompt censored]
NSFW images can also be analyzed in Gemini 2.5 Pro/2.5 Flash models.
This works like an RP because it is a version of the NSFW RP prompt designed to provide high-level censorship relief.

25.07.01/Gemini Video/Image Captioning UI beta

This tool processes multiple video and image files using a queue. It features a 3-stage captioning pipeline (individual frames, composite summary, and final rewrite) to generate a clean .txt caption and a detailed .json log for each file.
You have full control over the process. Adjust frame sampling by FPS or a total frame limit. Customize all prompts and save them as templates. An optional video splitting mode is available for very long files.
It includes a robust fallback system that automatically cycles through multiple API keys and models to avoid rate-limit errors and ensure tasks complete. You can also fine-tune performance with settings for API delay and concurrent workers.
Manage everything in an intuitive GUI with real-time logging. All your settings are saved on exit and reloaded on launch.
To run this tool, you need to install the required libraries with the following command:
pip install PyQt5 opencv-python google-generativeai

Please be aware that due to an unintended logic issue, the Start Processing and Stop Processing buttons and output /dir may be inconvenient to use in the current version. This will be improved in a future update.

25.06.30/Standalone Gemini UI v2
I've improved the convenience of the UI, and the templates now include WAN 2.1 i2i v1.21b and FLUX kontext i2i prompts. Before using this program, you need to install the necessary libraries by typing: pip install customtkinter google-generativeai pillow pyperclip googletrans==4.0.0-rc1 requests

25.05.30/v1.21b for Wan2.1 I2V
i2v Update: Precise action control (new syntax/structure), camera impact reduced for motion focus; increased NSFW refusals possible.

25.05.21/Standalone Gemini UI (v1.1) - The existing ZIP file has been updated. Please re-download it if you need the latest version.

The default prompt has been modified, allowing for normal use of both gemini-2.0-flash and gemini-2.5-flash-preview-04-17 versions.

However, NSFW image analysis is only available with gemini-2.0-flash(However, sometimes 2.5 flash is also available), and there may be occasional instances where analysis is unsuccessful. (In such cases, please retry the analysis. It will definitely work.)

Additionally, a final prompt translation feature has been added. Therefore, the existing installation command will be changed as per the command below.

pip install google-generativeai customtkinter Pillow tkinterdnd2-Universal googletrans==3.1.0a0

25.05.17/Standalone Gemini UI

This program offers a dedicated user interface for leveraging Google's Gemini, completely independent of ComfyUI workflows.

Why a Separate UI?

This tool was specifically developed to address a common challenge faced when performing image analysis in ComfyUI: the unloading of WAN (or other generative) models. This unloading process can lead to significant delays when you want to switch back to image generation. By using this standalone UI for image analysis with Gemini, you can keep your primary generative models loaded in ComfyUI, saving time and improving your workflow efficiency.

Default Prompts (via gemini_app_settings.json)

If you include the provided gemini_app_settings.json file in the same folder as the application, it will automatically load a default prompt set (e.g., configured for "v1.2a wan2.1 i2v" or your specified default). You can, of course, modify this or use your own prompts within the UI.

Getting Started - Installation

To run this application, you may need to install a few Python libraries. Please open your command prompt (CMD) or terminal and enter the following commands:

pip install google-generativeai customtkinter Pillow tkinterdnd2-Universal

How to Run

Ensure you have Python installed on your system.
Install the required libraries using the pip install commands above.
Place the prompts.json file (if you have one for default prompts) in the same directory as the Python script.
Run the scrip: To run with a visible console window: python gemini_ui.py

NSFW images analysis
If you are analyzing NSFW images, add the relevant content description to the very bottom of the "System Prompt" field.

[**User Input**: (Your Prompt)]

=====

25.05.14/v1.0b Joy caption for i2v
Full, uncensored image analysis and i2v prompt generation is achieved using JoyCaption. The resulting natural motion behavior is distinct and, in some cases, may not reach the same level of fluidity as the Gemini 2.0 Flash (for which an almost flawlessly uncensored version has previously been established).
huggingface demo: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
github: https://github.com/fpgaminer/joycaption

25.05.05/v1.2a for i2v, v1.1a for start-end, v1.0a for Framepack
This version has been updated to align with the recently revised custom node and to ensure the analysis of NSFW images or prompts.
+I've modified some custom nodes that can't be found in the Manager. You won't feel uncomfortable installing custom nodes anymore.
+The latest version of the ollamamini custom node is required.

25.04.18/v1.0 for start/end
Resolved an issue resulting in excessively lengthy final prompts; improved the coherence and visual connectivity for transitions between start and end frames, and added a translation node.

25.04.18/v1.0 for FramePack
Create a very simple prompt.
https://github.com/lllyasviel/FramePack

25.04.14/v1.1 for i2v
Fixed an issue caused by an overly long and unnecessary final prompt, and adjusted to avoid consecutive API calls.
*25.04.15/v1.1a - Add translation node

25.03.19/v1.0
Fixed an issue where a single incorrect symbol was present in the LLM prompt. This is a minor change, but it could slightly improve issues that may occur when inputting text in languages other than English. Additionally, the default setting for the stream option has been changed from ON to OFF.

25.03.25/for start-end frame(beta) -> beta+ (Improved results by modifying some of the prompts)
kijai workflow
Analyses the start and end images and ultimately generates an appropriate prompt for use in the i2v start-end workflow. However, depending on the image or motion, the end frame may not work properly. (If you can input the additional motion correctly, you can reinforce the intermediate movement using the existing v1.0 workflow.)

Model Type	Other
Base Model	Other
Published	2025-07-30

Gemini, prompt generator

Details

Download Files (1)

About this version

Model description

Using a custom LLM prompt, analyze the image and output the structure as a prompt suitable for the i2v model.

Images made by this model