Akanezora | Anima | [WIP]
세부 정보
파일 다운로드 (3)
모델 설명
Akanezora-Anima
Akanezora is a full fine-tune trained entirely on an single RTX 3060 to prove it's possible on consumer hardware. If you have 12GB of VRAM and some spare time, check out the trainer: [Aozora_SDXL_Trainer]
Version 0.5 Preview
This is a 50% training checkpoint from the first full training run on 36k images (2.73 epochs total). The dataset consists of 60% Danbooru-tagged images with generated natural language and 40% hand tagged / natural language-captioned images. During training, the text conditioning was split into 65% tags and 35% natural language prompts.
While this release is functional, it should be considered a work in progress rather than a finished product. It is being shared early to demonstrate the viability of the training method and to showcase the Aozora trainer’s capability to fine-tune anime DiT models on low VRAM hardware.
Pros:
Reduces unwanted text generation on images by 80%, eliminating the need for negative prompts.
Yields slightly more dynamic seed variations due to soft conditioning.
Config strength is more forgiving/less sensitive, also due to soft conditioning.
Significantly more responsive to tags, thanks to the inclusion of mostly Dan tags during training.
Offers Better overall composition and feel on varied prompts
Better at NSFW
Cons:
Seed outputs may still closely resemble the base model in certain cases.
Struggles with lighting effects unless explicitly specified in the tags.
Model Transparency Notice
For transparency and reproducibility, all training settings are included. This is a complete full fine-tune. Every setting and piece of code used to train the model is open source on my GitHub for all forms of use
Many models currently marketed as "full fine-tunes" are in fact LoRA/LoKR/LyCORIS merges or model merges, a known source of LoRA inconsistencies and layer overfitting and collapse.
This model:
Training started with Unmodified base weights
Zero merge operations applied
No LoRA adapters — ever
Full end-to-end training, no sublayer freezing besides the required (llm_adapter)
Training Settings
Base Model: Anima Base 1.0
Training Hours: 104h:55m:17s
GPU Used: NVIDIA GeForce RTX 3060 (12 GB) | Driver version: 32.0.15.9636
VRAM Usage: ~11.4GB
Mixed Precision: bfloat16
Batch Size: 1
Gradient Accumulation: 4
Learning Rate: 5e-6
Timestep: Wave (Freq:1.0 | Phase:3.14 | Amp:0.60)
Optimizer: Raven[Adamw float32 variant with offloading] | (betas:0.9, 0.999 | eps:1e-08 | Weight Decay: 0.01| Debias: 1.0)
Max Train Steps: 201010 (Completed:101282)
Current Checkpoint: ~50% through planned training
Trainable Parameters: - (P: 1,956,405,248 | P Frozen: 6.44% [llm_adapter.*])
Soft text cond: ( 0.75 - 1.25)
Dataset Size: 36202
Training Resolution: 1152x1152 (Aspect Ratio Bucketed: 864x1536 to 1536x864)
Vram saving techniques: (Momentum offloading, Bfp16 mixed precision, Pre Caching vae and text encoders, Gradient Checkpointing)
Recommended Generation Settings
Sampler: ER_SDE
Scheduler: Beta
Steps: 15-50
CFG: 3-5
Negative Prompt: worst quality, low quality, lowres, score_1, score_2, score_3, blurry, jpeg artifacts
Note: You need to use qwen_3_06b_base.safetensors for text encoder, and qwen_image_vae.safetensors for VAE.
Notes
Feedback is welcome, especially on prompt following, anatomy, hands, style consistency, repeated patterns, overfitting, and behavior without heavy negative prompts.
License
This model follows the license of its base model, Anima. Review and comply with the base model terms before using or redistributing.
















