Akanezora
세부 정보
파일 다운로드 (3)
이 버전에 대해
모델 설명
Akanezora-Anima
Akanezora is a full Anima DiT fine-tune trained entirely on a single RTX 3060 with 12GB of VRAM using Aozora. It is released both as a usable model and as proof that full Anima DiT fine-tuning can be done on consumer hardware.
If you're interested in fine-tuning your own model, the training code is available here: [Aozora_SDXL_Trainer]
Branch A vs B
Branch A: follows the recommended Anima tuning setup with bell-weighted loss and non-uniform timestep sampling.
Branch B: uses my experimental setup with uniform timestep sampling and uniform loss weighting, giving the model more even exposure across the full noise range. In my testing, this improves visual style and composition, but is slower and harder to tune.
Version 0.55b Preview
This is a 55% training checkpoint continued from the 0.5a checkpoint, trained on 15k images using experimental Branch B settings. The dataset consists of 50% Danbooru-tagged images with generated natural language and 50% hand-tagged / natural language-captioned images. During training, text conditioning was split into 90% tags and 10% natural language prompts.
While this release is functional, it should be considered a work in progress rather than a finished product. It is being shared early to demonstrate the viability of the training method and to showcase the Aozora trainer’s ability to fine-tune Anima DiT models on low-VRAM hardware.
Pros:
Reduces unwanted text generation by around 70%, reducing the need for heavy negative prompts.
More responsive to Danbooru-style tags.
Slightly more dynamic seed variation due to soft conditioning.
More SDXL-inclined generation style.
Better overall composition and prompt feel across varied prompts.
Improved NSFW output quality.
Cons:
Some seeds may still closely resemble the base model.
Lighting effects often need to be prompted directly.
Still early in training and may hallucinate content.
0.55b Training Settings
Base Model: Akanezora V0.5a
Training Hours: Unknown (Power went out so it took 2x longer, estimated around 50 hours)
GPU Used: NVIDIA GeForce RTX 3060 (12 GB) | Driver version: 32.0.15.9636
VRAM Usage: ~11.4GB
Mixed Precision: bfloat16
Batch Size: 1
Gradient Accumulation: 4
Learning Rate: 6e-6
Timestep: Uniform
Loss: Uniform
Optimizer: Raven[AdamW float32 variant with offloading] | (betas:0.9, 0.999 | eps:1e-08 | Weight Decay: 0.01| Debias: 1.0)
Max Train Steps: 201010 (Completed:115282)
Current Checkpoint: ~55% through planned training
Trainable Parameters: - (P: 1,956,405,248 | P Frozen: 6.44% [llm_adapter.*])
Soft text cond: ( 0.75 - 1.25)
Dataset Size: 15164
Training Resolution: 1152x1152 (Aspect Ratio Bucketed: 864x1536 to 1536x864)
VRAM saving techniques: (Momentum offloading, bfp16 mixed precision, pre-caching VAE and text encoders, Gradient Checkpointing)
v0.50 settings: bf16 mixed precision, batch 1, grad accum 4, LR 5e-6, Raven AdamW offload optimizer, wave timestep schedule, soft text conditioning 0.75–1.25, 1152 bucketed training from 864x1536 to 1536x864, VAE/text encoder pre-cache, gradient checkpointing, and momentum offloading.
Recommended Generation Settings
Sampler: ER_SDE
Scheduler: Beta
Steps: 15-50
CFG: 3-5
Negative Prompt: worst quality, low quality, lowres, score_1, score_2, score_3, blurry, jpeg artifacts
Note: You need to use qwen_3_06b_base.safetensors for text encoder, and qwen_image_vae.safetensors for VAE.
Model Transparency Notice
For transparency, this release includes the training setup and links back to the open-source trainer/code used to create it. This is a full fine-tune checkpoint with no LoRA, LoKR, LyCORIS, or model merge operations applied.
This model:
Training started with unmodified base weights
Zero merge operations applied
No LoRA adapters — ever
Full end-to-end training, no sublayer freezing besides the required (llm_adapter)
Notes
Feedback is welcome, especially on prompt following, anatomy, hands, style consistency, repeated patterns, overfitting, and behavior without heavy negative prompts.
License
This model follows the license of its base model, Anima. Review and comply with the base model terms before using or redistributing.
















