LuminaYume (Lumina Image 2.0)

Details

Download Files

Model description

I. Overview

This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.

Key Features:

  • Supports anime image generation using Danbooru tags

  • Improved accuracy in placing objects correctly within the image based on prompt descriptions

  • Preserves a good portion of the base model's original knowledge

Limitation:

For version 0.1:

  • Text generation inside images is still inaccurate.

  • Output image quality is currently moderate and may vary depending on prompts.

  • Understanding of specific character prompts via Danbooru tags is limited.

II. Model Components:

  • Text Encoder: Pretrained Gemma-2-2B

  • VAE: From Flux.1 dev's VAE

  • Image Backbone: Fine-tuned version of Lumina's backbone

  • Trained on a diverse 30M-image dataset including:

    • Anime images (tagged with Danbooru)

    • Realistic human photos

    • Text-containing images

    • Images with detailed spatial annotations

III. File Information

This all-in-one file includes weights for VAE, text encoder, and image backbone. Fully compatible with ComfyUI and other systems supporting custom pipelines.

If you'd like to use this model via Hugging Face's diffusers library, click here for more details.

IV. Suggestion Settings

System Prompt

  • For anime (Danbooru tags):

    • You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process .

    • You are an assistant designed to generate high-quality images based on user prompts and danbooru tags.

  • For general use:

    • You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.

    • You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.

Recommended Settings

  • CFG: 3–6

  • Sampling Steps: 40-50

  • Sampler: Euler a

V. Notes & Feedback

This is an experimental release, and I plan to improve it in future versions.
Feedback, suggestions, and prompt ideas are always welcome — your support helps make this better!

In addition to English prompts, this model also supports prompts in Chinese and Japanese.

VI. Acknowledgments

  • Big thanks to narugo1992 for the dataset contributions.

  • Credit to Alpha-VLLM for the fantastic base model architecture.

  • Shoutout to AngelBottomless and his team for sharing their experiments with Lumina-Illustrious, which helped guide parts of this project.

If you'd like to support my work, you can do so through Ko-fi!

Images made by this model

No Images Found.