Hikarimagine XL

Details

Download Files

Model description

This is an experimental model based on Animagine XL 4.0.

Original Model developerd by Cagliostro Research Lab

License: Open rail ++

Wanted to try out a bit after reading this article https://www.reddit.com/r/StableDiffusion/comments/1o1u2zm/text_encoders_in_noobai_are_dramatically_flawed_a/ . Perhaps it would be a much better idea to train text encoder before hand and then keep it frozen during training. I merged the Clip L in to animagine xl 4.0. And further trained it to fix the broken image and bring the knowledge to around 2025 May using 700k image for 3 epoch. Though this is probably not sufficient yet. I might do another training to bring it to September 2025.

Also built a simple platform where you could generate image for free: https://miyukiai.com/

How ever due to the number of gpus there might be longer waiting times.
If you like my work, donating can support my model development and keeping the platform free: https://ko-fi.com/suzushi2024

Following is just a little note and thoughts on ai model in general and future plans at the moment.
Originally plan was to develop a good sd3.5 medium anime base model. Since a few months ago there were many project evolving around it. And if we have several base models and loras there could be a really good eco system for sd3.5m. However many of these projects seems to be cancelled or failed during their training. In addition with the new changes all sd3/ 3.5 series models are removed from civit ai again. Nonetheless I will still be updating this series on huggingface if anyone is interested: https://huggingface.co/collections/suzushi/miso-diffusion-m.

I am also looking forward to train another smaller Dit base model. So far Lumina seems promising. Majority of dit model are bigger in size, and having to wait for at least 90 sec for an image on rtx 4080 etc is not only too long, most of the people don't have these type of hardware either. So the goal is to pick a small and robust model. I have began to prepare fine tuning the text encoder but more preparation needs to be done. There was another experiment training done on Sana. While it allowed for faster generation, the small parameter means that its more likely to generate flawed image and produce body horror ( Especially on the hand ). And it is slower at capturing fine details, therefore I don't think it is suitable for the next model.

Images made by this model

No Images Found.