GAOGAO-LUMINA
Details
Download Files
About this version
Model description
GAOGAO-LUMINA
A simple fine-tune of NETA-LUMINA
What is it?
Simply put: It is fine-tuned from the NETA-LUMINA base model, trained on a manually curated selection of 10,000 images. This model is the final result.
What can it do?
It's similar to NETA-lumina, but the image quality is slightly better, and it adds more detail to the pictures.
Why is it V0.1?
Well... that's hard to say. While training this model, the machine deployed on the cloud reported an error and failed. I think I'll need some time to restart the training, and I also hope to use a more scientific and rational method for it.
How should it be used?
In a sentence: Use 1girl/1boy, followed by natural language. However, I do not recommend using this model by itself; I strongly suggest using it in combination with other style LoRAs. Using tags is helpful to a certain extent, but it won't unleash the model's full potential.
If this is your first time using a NETA-lumina model, I recommend you check out the official tutorial written by NETA.ART. Unlike previous models like SDXL (ILL/NOOB) or SD1.5, which heavily rely on tags for prompts, in NETA-lumina, your prompts should primarily be natural language.
Also, what I am providing here is only the main model file. You will still need to download the VAE and gemma2 separately.
Future plans?
The first goal is to release a version 1.0.
Support me?
Join QQ group 1020622167 to hang out and chat.
Below are some ramblings. These are some common knowledge and tips about NETA-LUMINA, most of which come from conversations with others, my own observations, and the experiences of others. There might be inaccuracies, so feel free to leave your own experiences or insights in the comments.
NETA-LUMINA is a natural language model, which means its support for tags is actually very low. Although reports suggest that tags made up nearly 20% of the training data, in practice, the model's support for tags can be described as disastrous. A more acceptable explanation for this is: since Lumina uses Gemma as its text encoder, and Gemma was not specifically designed to parse tags, the tags you input are sliced into a very fine-grained set of phrases by the LLM's tokenizer.
When training a LoRA, you can indeed use pure tags for training, but the price is very slow fitting and a quality that doesn't match the effort (if you are very rich, we can ignore this point).
The system prompt is necessary. It acts more like a trigger word. Since it was present during the training of both the base model and the LoRA, there is no reason to remove it during generation.
Regarding artist tags, the reason many artist tags don't respond or have negative effects is mentioned above: the LLM doesn't actually have tokens for these artist names and is bound to break them up during training. For some artist tags, they might only occupy 2-3 tokens, which is relatively good for style fitting because the tokenizer won't break them up too much. My practical observations confirm this: artist tags with fewer tokens fit better. Conversely, some artist tags fit very poorly because they occupy a terribly long sequence of tokens.
Regarding the knowledge problem, NETA-LUMINA actually possesses a wide range of knowledge, but due to various issues, this knowledge is difficult to call upon. Based on my observations, these weights may be too chaotic. A LoRA should be helpful in this regard.
下面是中文介绍:
GAOGAO-LUMINA
NETA-LUMINA的简单微调
它是什么?
简单来说:它使用NETA-LUMINA的基本模型微调而来,在手工筛选的10K张图像中进行了一定程度的训练,最终结果就是这个模型。
它能做什么?
和NETA-lumina差不多,但是画面会更好一点。以及为为画面添加更多的细节。
它为什么是V0.1?
嗯....这很难说,在训练这个模型的时候,部署在云端的机器报错并失效了,我想我应该需要一段时间来重新开始训练,而且我也希望使用更科学更理智的方法来训练。
它该如何使用?
一句话概括:使用1girl/1boy,然后在后面接上自然语言。但是我不建议你单独使用这个模型,我强烈建议您使用其他的画风lora搭配使用。
使用TAG在某种程度上会有用,但是不会发挥到模型的全部性能。
如果你是第一次使用NETA-lumina模型,那么我建议你去看看NETA.ART他们官方撰写的教程。它不像以往的SDXL(ILL/NOOB)或者SD1.5大量使用TAG作为提示词,在NETA-lumina中,你的提示词应该主要为自然语言。
以及,我在这里提供的只有模型的主干,你仍然需要单独的下载VAE,gemma2。
后续的计划?
至少先更个1.0出来吧。
支持我?
加入QQ群1020622167来吹水聊天)
下面是碎碎念,这些东西是在NETALUMINA上的一些常识,技巧,他们大部分来自和其他人的交流,我的观察,以及他人的经验。它们可能有不准确的地方,你可以在底下留下你的经验或者认识。
1.NETA-LUMINA是一个自然语言模型,也就是说它实际上对TAG的支持非常低,虽然根据消息来说,在训练中其实TAG占据了将近20%的比重,但是在实际的情况上来说,模型对TAG的支持可以用灾难来形容,目前一个比较可以接受的解释是:由于lumina使用gemma作为文本编码器,而gemma并没有专门为分割tag而设计,导致你输入的TAG会被LLM的分词器切成非常细的一堆词组。
2.在lora训练时,你确实可以使用纯TAG进行训练,但是代价是非常慢的拟合以及完全对不上消耗的质量(如果你非常有钱那么我们可以忽略这条)。
3.系统提示词是必须的,他更像是一个触发词,在base和lora的训练中他就一直存在,那么在生成的时候我们也没有必要把他拿走。
4.关于艺术家标签,很多艺术家标签不响应或者出现负面效果的原因我在上面提到过,因为LLM实际上并没有这些艺术家标签的词元,在训练的时候必定会将其打散。对于某些艺术家标签来说,他可能只占了2-3个tokens,这对于风格拟合来说其实比较良好,毕竟他不会被分词器打的太散。在我的实际观察来说也确实如此,tokens越少的艺术家标签拟合的越好,相反的是一些艺术家标签因为占用的tokens长的可怕,所以拟合相当糟糕。
5.关于知识问题,netalumina其实拥有相当广泛的知识,但是由于各种各样的问题,这些知识很难被调用出来,根据我的观察,这些权重可能过于混乱,lora应该会有帮助。




