anima_style_test01
詳細
ファイルをダウンロード (2)
このバージョンについて
モデル説明
test something to more easily overfit the style into anima model without text conditioning
the architect and code was whole heartedly vibe-coded with LLM, basically just mimic IP-adapter to add another KV path in DiT blocks, but instead of using image encoder for conditioning, we just learn the kv directly, and a light weight bottleneck projection layer to extract the style features
code based on commit 068bcd7 of kohya-ss/sd-scripts repo on github
dataset description of each model:
miyajima_reiji_style_dual_kv-000009: manga cover image, take on danbooru (low quality, 14 images)
itachi_3dt_style_dual_kv-000012: a mixed of arts found on danbooru and coloring pages from manga (mixed of low quality and average quality, 14 images)
dramus_style_dual_kv-000008: edit and crop out manga images (average quality, 298 images), colored cover (high quality, 2 images)
all images were resized to 1024x1024 anyway and even when training on bad quality the model don't output jpeg artifacts so seems like it doesn't affect much (maybe due to low parameters count?)
the model can overfit a style quite easily, however it still affect the prompt activation and guidance (refers to examples, all were set to seed 42). Prodigy scheduler has models overfit faster than training with AdamW8.
Trained with command (cmd windows)
accelerate launch --num_cpu_threads_per_process 1 anima_train_custom_style.py ^
--pretrained_model_name_or_path="models/diffusion_models/anima-base-v1.0.safetensors" ^
--qwen3="models/text_encoders/qwen_3_06b_base.safetensors" ^
--vae="models/vae/qwen_image_vae.safetensors" ^
--dataset_config="datasets/{DATASET}.toml" ^
--output_dir="output" ^
--output_name="{OUTPUT_NAME}" ^
--save_model_as=safetensors ^
--num_style_tokens=8 ^
--network_dim=64 ^
--learning_rate=1.0 ^
--optimizer_type="Prodigy" ^
--attn_mode="flash" ^
--gradient_checkpointing ^
--lr_scheduler="cosine" ^
--timestep_sampling="sigmoid" ^
--sigmoid_scale=1.0 ^
--sample_prompts="datasets/{SAMPLE_PROMPTS}.txt" ^
--sample_every_n_epochs=1 ^
--max_train_epochs=20 ^
--save_every_n_epochs=1 ^
--mixed_precision="bf16" ^
--cache_latents ^
--cache_latents_to_disk ^
--vae_chunk_size=64 ^
--vae_disable_cache ^
--max_data_loader_n_workers=4with `gradient checkpointing` enabled, can be trained on 5060 laptop with 8GB VRAM overnight
DATASET.toml used:
[general] caption_extension = ".txt" shuffle_caption = false flip_aug = false color_aug = false[[datasets]] resolution = 1024 batch_size = 1 enable_bucket = true bucket_reso_steps = 16
[[datasets.subsets]] image_dir = “[DATASET_PATH]” num_repeats = 20 # for low image count, turn up repeats to bash the model until overfit
sample prompts used (multiple subject at different distances)
masterpiece, best quality, 2girls, portrait, close-up, school uniform, serafuku, one girl smiling at viewer, other girl looking away shyly, long hair and short hair, detailed faces, classroom background –w 1024 –h 1024 –d 42
masterpiece, best quality, 2girls, medium shot, standing together, casual clothes, one girl playing acoustic guitar, other girl singing with microphone, music club room, instruments around, dynamic pose –w 1024 –h 1024 –d 42
masterpiece, best quality, 2girls, full body, walking side by side, summer festival, yukata, one girl holding cotton candy, other girl with fan, night market lights, lanterns, crowd in distance –w 1024 –h 1024 –d 42
masterpiece, best quality, 2girls, wide shot, rooftop at sunset, one girl sitting on edge, other girl standing leaning on railing, school uniforms, city skyline background, warm lighting –w 1024 –h 1024 –d 42
masterpiece, best quality, 2girls, long shot, faraway view, fantasy meadow, one girl in mage robe casting spell, other girl in knight armor defending, cherry blossoms floating, dramatic sky, epic scene –w 1024 –h 1024 –d 42Inference using:
python anima_minimal_inference_style_custom.py ^
–dit “models/diffusion_models/anima-base-v1.0.safetensors” ^
–vae “models/vae/qwen_image_vae.safetensors” ^
–text_encoder “models/text_encoders/qwen_3_06b_base.safetensors” ^
–style_weights “output/[STYLE_WEIGHT].safetensors” ^
–from_file “datasets/[EVAL_PORMPTS].txt” ^
–attn_mode=flash ^
–save_path “[SAVE_PATH]”










