Akanezora-Anima

Akanezora is a full Anima DiT fine-tune trained entirely on a single RTX 3060 with 12GB of VRAM using Aozora. It is released both as a usable model and as proof that full Anima DiT fine-tuning can be done on consumer hardware.

If you're interested in fine-tuning your own model, the training code is available here: [Aozora_SDXL_Trainer]

Quantized Versions

FP8: Recommended for RTX 40-series or newer GPUs with native FP8 support. Saves VRAM with near-base quality; older GPUs may use slower FP8 emulation.

INT8: Recommended for RTX 20-series or newer. Uses ComfyUI int8_tensorwise loading and requires ComfyUI v0.27.0+.

GGUF Q8_0 / Q5_1: Broad low-VRAM GPU support through [ComfyUI-GGUF]. Use its Unet Loader (GGUF) node. Q8_0 is most faithful; Q5_1 is smaller but changes seeds more.

For non gguf quants Use torch.compile for additional speed if supported, If a model wont load first make sure comfyui is updated to the latest version v0.27.0+. for gguf you are may need to install gguf into comfy via python -m pip install --upgrade gguf

Unlike most mixed-precision quants that trim a few layers for modest savings, these are fully quantized across every supported layer and then QAT-repaired via retraining to hold quality near base — took extra weeks, but worth it i think.

Converting Anima to GGUF was a hassle because ComfyUI uses a custom quantized loader and naming format, while GGUF doesn’t officially support Anima and only recognizes its underlying Cosmos architecture. Some shenanigans were needed to make everything compatible with base ComfyUI and the ComfyUI-GGUF addon.

Since I think everyone deserves access to lower-VRAM models, here is the complete open-source code for converting Anima models to GGUF or other quantized formats like FP8. [Convert_Anima_to_quants] - Enjoy all 4100 lines of nonsensical experimental code that does indeed work

Version 0.65b Preview

A higher-resolution retrain split from the original Akanezora run using the new Anima Aesthetic-v1.0 release as a shortcut to allow the model to adapt to 1536 res faster, using a stripped-down dataset of 11093 hand curated images at up to 1536px.

Improves composition at higher resolutions, Danbooru-tag response, NSFW quality, and unwanted-text suppression. As an early preview, occasional hallucinations remain and some outputs may resemble the base model.

this higher res retrain now allows highres fix to work properly so i recommend using it

Training

Base: Anima Aesthetic-v1.0
Resolution: Up to 1536px, aspect-ratio bucketed
GPU: RTX 3060 12 GB
Precision: BF16
Batch: 1 × 4 gradient accumulation
Optimizer: Raven/AdamW, momentum offloading
Learning rate: 3e-6 peak, custom schedule
Objective: Custom Weighted flow-matching MSE
Sampling: Stratified Uniform timesteps
Frozen: llm_adapter.*
Conditioning: Danbooru tags, 0.9–1.1 soft scaling
Memory: Approximately 11.4 GB using caching, checkpointing and SDPA

Recommended Generation Settings

Sampler: ER_SDE
Scheduler: Beta

Steps: 15-50

CFG: 3-5

Negative Prompt: worst quality, low quality, lowres, score_1, score_2, score_3, blurry, jpeg artifacts
Note: You need to use qwen_3_06b_base.safetensors for text encoder, and qwen_image_vae.safetensors for VAE.

Model Transparency Notice

For transparency, this release includes the training setup and links back to the open-source trainer/code used to create it. This is a full fine-tune checkpoint with no LoRA, LoKR, LyCORIS, or model merge operations applied.

This model:

Training started with unmodified base weights
Zero merge operations applied
No LoRA adapters — ever
Full end-to-end training, no sublayer freezing besides the required (llm_adapter)

Notes

Feedback is welcome, especially on prompt following, anatomy, hands, style consistency, repeated patterns, overfitting, and behavior without heavy negative prompts.

License

This model follows the license of its base model, Anima. Review and comply with the base model terms before using or redistributing.

모델 유형	체크포인트
기본 모델	Anima
게시일	2026-07-13

Akanezora - Multi-Precision + GGUF

세부 정보

파일 다운로드 (8)

모델 설명