RDBT | Anima

詳細

ファイルをダウンロード (1)

モデル説明

RDBT [Anima]

Finetuned + distilled.

No overfitted default style. Still creative, diverse and probably has better prompt adherence. I use it as a starting point to stack more style LoRAs.

See this page for update log. Random experiment, random quality. New version != better version. Feel free to leave feedback.

For advanced users: The RDBT model is trained as LoRA natively. See this page for original LoRA, update more frequently. If you already have pretrained ckpt, you don't need download this ckpt, just download LoRA.

This model is based on:

  • prefix with ym: AnimaYume (hf link) (civitai link). Has latest dataset, 1536px training. Check the model page for more info.

  • prefix with b,p: Anima pretrained (hf link)


Sharing merges using this model is not allowed. If someone is selling this model as their own, I'm happy to list them here so everyone knows.

Known model thieves: NukeA.I (behind paywall on tensorart).

I wrote a story about it. Also contains a guide for trainers about "how to bake special trigger word into your model".

FYI: This model is trained with "latent watermark" and "special trigger words". To clarify, I can't track the image it generated. "Latent watermark" will be ignored by VAE. But I know if a model is/merges this model, even it is behind close-sourced paywall.


Usage:

Settings:

CFG scale: 1~3. This model has been guidance distilled. You can disable CFG (CFG 1) and run the model 2x faster. Cover images are without CFG for demonstration.

Steps: 16~24.

Prompt

Always specify style, or use a style LoRA. Otherwise, you will get random/mixed style. This model does not provide overfitted default shiny plastic glossy AI-slop style. This is a feature, not a bug.

Quality tags:

It's recommended to omit all the quality tags, or just keep the "masterpiece", if you're not confident. Omitting those redundant tokens allows LLM to pay more attention on other words.

Quality tags have been reinforced during distillation. Thus they don't have noticeable effects. Same as negative tags. If you use cfg, there is no need to dump "score_1, blurry, worst quality, jpeg artifacts, extra arms,... x100 words" in your negative prompt. Those things have been distilled out.


FAQ

FAQ: Anima Turbo and RDBT distillation.

Anima Turbo:

  • Can generate high-quality images in just 4 steps without CFG, 12x faster.

  • Highest stability, lowest diversity.

  • Although you can get diversity back by lowering LoRA strength (e.g. 0.5x). As the LoRA strength decreases, the model output strength also decreases, you will need to enable CFG again (e.g. CFG 1.5), otherwise you will get washed-out or even deformed image. FYI: this is why many models merged Turbo LoRA, but still needs CFG.

  • (?) Probably has some kind of reinforcement learning to boost details. It likes to generate images that are extremely (often exceeding the normal range) detailed.

  • Has a huge impact on other style LoRAs.

RDBT distillation:

  • 12~16 steps without CFG. 3~4x faster.

  • Compared to Anima Turbo, slower, less stable, but has much higher diversity. A trade-off.

  • Due to lacking of resources, artist tags are not in distillation target. Thus, without cfg built-in artist styles will be a little bit weaker. I'm not a gigachad and can't handle +20k style tags.

  • However, might be because of this, it won't completely nuke other style LoRAs like Anima turbo.

There is no good or bad. It depends what you want.


FAQ: Is the distillation model a bad thing?

Three years ago, it was a bad thing because people didn't know how to distill.

Now things have changed drastically. Distillation models have better quality than the original models, because that's how distillation works—refining and enhancing the "good" features. Distillation models are fast, high-quality, and extremely stable. The only problem with distillation models is the "extreme stability." This means less diversity, but it's a feature, not a bug, and most people won't care. All recently released models (zit, flux2 klein, boogu...) have a "turbo" version, and people love the "turbo" version.


FAQ: Training settings

~10k images finetuning -> guidance distillation

All captions are NL from Google Gemini.

Optimizer: adamw, constant lr 0.00002, weight decay 0.1, batch size 16.

LoRA rank/alpha 24.

Timesteps shift 3.

Block 0-2 and adaln linear layers are skipped.

このモデルで生成された画像