32/16(dim/alpha)
16/4(conv dim/alpha)
adafactor(scheduler/optimizer)
true/true/false(relative step/scale parameter/warmup init)
true(shuffle caption, 28% images in trigger tag folder)
1/0.15(scale weight norms/network dropout)
true(de-biased estimation loss)
0.05/0.1(noise offset/ip noise gamma)
1(batch size)
Caption data is mostly generated by wd-eva02-large-tagger-v3 and pruned with the help of HWtagger and taggui.
Some style tags and character tags are deleted for better training performance.
Model is Trained 64 epochs in one go. Trained LoRA version is discarded due to less fidelity.