PonyDiffusion Quality Slider

详情

下载文件

模型描述

This Model Increases "Quality"

Now, you might be wondering "What is 'quality'" and I, unfortunately, can't directly answer that question, however, I can tell you that this model was trained on images that were generated with the abhorrent quality tag monstrosity that Pony Diffusion V6 XL has, but the captions didn't include the quality tag. this means it was trained to emulate the output of adding the quality tag, without adding the quality tag.

Why would I do this?

Well, I was very dismayed to find that the quality tag was so absurdly long and very poorly controllable, so I decided to make a LoRA to control the quality with a slider. this allows a more fine-tuned and varied approach to quality control and additionally saves on tokens in the prompt (which is useful to avoid unwanted BREAKS in attention)

How did I do this?

I used the LECO training script on GitHub by P1atdev based on the LECO paper to train this model, the LECO training process generates an image at an arbitrary denoising strength, and then trains on the difference between the output of the model when prompted for a concept, and when unprompted. this allows the model to alias tags, words, concepts, or phrases to any arbitrary prompt, in this case, I aliased

score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up

the "quality paragraph" to

so I effectively was training it to always produce images that looked as though it had been prompted for the "quality paragraph"

Donations

Speaking of training, training models is expensive and I run training on my own private server, if you like what I do, consider supporting the development here!

https://ko-fi.com/yolup

Key Benefits

  1. One of my favorite key benefits of this approach is that it makes "quality" a modular and controllable thing. adding weight to the quality tag has a somewhat enigmatic effect on the output however, this LoRA/LECO has very clearly defined and comprehensible changes that you can control the severity of by altering the weight of the LoRA/LECO which is an intended operation (as opposed to weighted prompts being a hack applied to the attention layer that doesn't always have the desired effect)

  2. The other benefit is that this LoRA/LECO doesn't use up the 33 tokens that the "quality paragraph" uses! that's like half of an entire context window! eating up the context window forces the backend you are using, whether it be A1111, InvokeAi, or ComfyUI to add invisible BREAKs in attention that damage the overall coherence of the prompt you are constructing and can lead to other unintended consequences.

Quirks

v1 of this model is quite impotent and seems to operate stably with a weight between 2-3, but it continues to produce outputs that are recognizable up to weight 6

v3 has standard weight behavior, use it like you might a normal LoRA

v3 requires you to use a rating and source tag in your prompt

This model was only trained and tested on PonyDiffusion V6 XL! I do not guarantee compatibility with other models!

v1 of the model definitely alters backgrounds to appear more "painterly" and it means the backgrounds will fall apart significantly faster than the subject at higher strengths. if you care a lot about the background of an image, you might want to only use this model supplemental.

I have yet to fully test v4, please let me know of any strange behavior exhibited by it.

While I have tried to isolate the "quality" concept in the model and remove it from similar concepts, it is only possible to go so far with it. It may alter the content of your generation in unintended ways. if you wish to discuss these abnormalities, please go to the furry diffusion server on Discord to bring them to my attention.
discord.gg/furrydiffusion

Once you have joined, I have a thread made for it here:

https://discord.com/channels/1019133813105905664/1214131180572639312

if you are wondering where v2 is, it was so bad I couldn't disgrace the community with it's publishing.

此模型生成的图像

未找到图像。