LuminaYume (Lumina Image 2.0)
Details
Download Files
Model description
I. Overview
This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.
Key Features:
Supports anime image generation using Danbooru tags
Improved accuracy in placing objects correctly within the image based on prompt descriptions
Preserves a good portion of the base model's original knowledge
Limitation:
For version 0.1:
Text generation inside images is still inaccurate.
Output image quality is currently moderate and may vary depending on prompts.
Understanding of specific character prompts via Danbooru tags is limited.
II. Model Components:
Text Encoder: Pretrained Gemma-2-2B
VAE: From Flux.1 dev's VAE
Image Backbone: Fine-tuned version of Lumina's backbone
Trained on a diverse 30M-image dataset including:
Anime images (tagged with Danbooru)
Realistic human photos
Text-containing images
Images with detailed spatial annotations
III. File Information
This all-in-one file includes weights for VAE, text encoder, and image backbone. Fully compatible with ComfyUI and other systems supporting custom pipelines.
If you'd like to use this model via Hugging Face's diffusers library, click here for more details.
IV. Suggestion Settings
System Prompt
For anime (Danbooru tags):
You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process .
You are an assistant designed to generate high-quality images based on user prompts and danbooru tags.
For general use:
You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.
You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.
Recommended Settings
CFG: 3–6
Sampling Steps: 40-50
Sampler: Euler a
V. Notes & Feedback
This is an experimental release, and I plan to improve it in future versions.
Feedback, suggestions, and prompt ideas are always welcome — your support helps make this better!
In addition to English prompts, this model also supports prompts in Chinese and Japanese.
VI. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Alpha-VLLM for the fantastic base model architecture.
Shoutout to AngelBottomless and his team for sharing their experiments with Lumina-Illustrious, which helped guide parts of this project.
If you'd like to support my work, you can do so through Ko-fi!




















