Zanime

详情

下载文件 (1)

模型描述

# 🔬 Z-Image Base Anime Finetuning – Full Technical Test Report
Epoch 100 Evaluation

This guide documents a complete testing and evaluation process of a 
Z-Image Base anime finetuning checkpoint, including training details, 
inference settings, prompt engineering findings, and sampler recommendations.

All findings are based on real testing with Epoch 100 checkpoint.

------------------------------------------------------------
🧠 TRAINING DETAILS
------------------------------------------------------------

Base Model:        Z-Image Base (Tongyi-MAI, released Jan 27, 2026)
Architecture:      S3-DiT (Single-Stream Diffusion Transformer)
Text Encoder:      Qwen-based (bilingual EN/CN)
Training Type:     Checkpoint Finetuning (not LoRA)
Epochs:            100
Steps:             68,830
Dataset Size:      1,375 Anime Images
Tagging System:    WD Tagger (Booru-style tags)

Avg Tags/Image:    ~47 tags
Unique Tags:       4,834
Total Tag Count:   64,276


------------------------------------------------------------
📐 DATASET RESOLUTION DISTRIBUTION
------------------------------------------------------------

Resolution   | Count | Ratio   | Quality
-------------|-------|---------|---------
1344x1728    | 132   | 3:4     | ✅ Good
768x1086     | 95    | ~9:13   | ⚠️ Odd-Size
832x1216     | 47    | ~2:3    | ✅ SD-Standard
1152x1536    | 45    | 3:4     | ✅ Good
768x1084     | 36    | Odd     | ⚠️ Odd-Size
768x1024     | 28    | 3:4     | ✅ Perfect
896x1152     | 22    | 7:9     | ✅ Good
1365x768     | 20    | ~16:9   | ↔️ Landscape
1248x1824    | 19    | ~2:3    | ✅ Good
768x768      | 19    | 1:1     | ✅ Standard


------------------------------------------------------------
🏷 TOP 50 TRAINING TAGS
------------------------------------------------------------

 1. 1162x  1girl
 2. 1034x  looking_at_viewer
 3. 1033x  solo
 4. 1001x  breasts
 5.  980x  long_hair
 6.  839x  blush
 7.  689x  smile
 8.  595x  large_breasts
 9.  529x  long_sleeves
10.  520x  closed_mouth
11.  466x  open_mouth
12.  456x  bare_shoulders
13.  426x  hair_between_eyes
14.  422x  shirt
15.  420x  thighs
16.  417x  blue_eyes
17.  380x  cleavage
18.  376x  medium_breasts
19.  370x  short_hair
20.  354x  hair_ornament
21.  344x  black_hair
22.  340x  collarbone
23.  328x  dress
24.  327x  simple_background
25.  317x  jewelry
26.  308x  holding
27.  299x  indoors
28.  298x  navel
29.  297x  sitting
30.  285x  outdoors
31.  284x  standing
32.  282x  gloves
33.  275x  skirt
34.  270x  very_long_hair
35.  269x  jacket
36.  269x  white_background
37.  268x  animal_ears
38.  259x  brown_hair
39.  253x  blonde_hair
40.  236x  thighhighs
41.  232x  white_shirt
42.  225x  red_eyes
43.  220x  parted_lips
44.  219x  multicolored_hair
45.  216x  cowboy_shot
46.  214x  bow
47.  214x  sky
48.  214x  sweat
49.  207x  ribbon
50.  207x  purple_eyes

------------------------------------------------------------
🏷 TOP 50 TRAINING TAGS nsfw
------------------------------------------------------------

 1.    139x  nipples
 2.    120x  nude
 3.    117x  uncensored
 4.    111x  pussy
 5.    100x  from_behind          
 6.     98x  lying                
 7.     94x  penis
 8.     81x  sex
 9.     80x  covered_nipples      
 10.    73x  completely_nude
 11.    72x  bra                  
 12.    70x  sideboob
 13.    67x  ass_visible_through_thighs
 14.    65x  spread_legs          
 15.    64x  vaginal
 16.    61x  pussy_juice
 17.    60x  testicles
 18.    59x  saliva               
 19.    57x  cameltoe
 20.    53x  erection
 21.    52x  anus
 22.    50x  pov                  
 23.    45x  sex_from_behind
 24.    44x  sex_from_behind
 25.    41x  huge_breasts         
 26.    40x  pubic_hair
 27.    39x  clothed_sex
 28.    38x  cum
 29.    36x  bottomless
 30.    35x  bent_over            
 31.    34x  wet_clothes          
 32.    33x  oral
 33.    32x  straddling           
 34.    31x  no_bra
 35.    31x  breasts_apart
 36.    30x  ass_grab
 37.    29x  cum_in_pussy
 38.    29x  clitoris
 39.    27x  ahegao
 40.    27x  rolling_eyes         
 41.    26x  yuri
 42.    25x  fellatio
 43.    25x  breasts_out
 44.    24x  underwear_only
 45.    23x  bdsm
 46.    22x  standing_sex
 47.    22x  cleft_of_venus
 48.    22x  doggystyle
 49.    22x  anal
 50.    21x  cum_overflow
------------------------------------------------------------
⚙ INFERENCE SETTINGS – WHAT WORKS
------------------------------------------------------------

Recommended Setup:

CFG:               4 – 6  (sweet spot confirmed)
Steps:             30 – 40
Resolution:        768x1024 (primary)
                   832x1216 (more detail)
ModelSamplingFlow: Shift 3.0  ← important
CFG Normalization: NOT tested 

------------------------------------------------------------
🎛 SAMPLER & SCHEDULER RESULTS
------------------------------------------------------------

CONFIRMED WORKING (anime-style output):

✔ Euler Ancestral  + Simple
✔ Euler Ancestral  + Normal
✔ DPM++ 2M         + Simple
✔ DPM++ 2M         + Normal
✔ DPM++ 2M SDE     + Simple
✔ DPM++ 3M SDE     + Simple
✔ Res Multistep    + Simple
✔ Res Multistep    + Normal

COMPLETELY BROKEN (unrecognizable output):

✘ All Karras variants
✘ All Exponential variants

Notes:

→ DPM++ 2M SDE and DPM++ 3M SDE tend to produce more realistic-looking backgrounds
→ All 8 working samplers produce top quality results
→ Personal preference decides final choice

------------------------------------------------------------
🧪 PROMPT ENGINEERING FINDINGS
------------------------------------------------------------

WD Tags (Booru-style):
+ Fast to write
+ Good character details
+ Good clothing recognition
- Slightly flatter clothing textures
- Less atmospheric backgrounds
- Less "alive" feeling overall

Fulltext English:
+ Richer clothing details and textures
+ Better atmospheric backgrounds
+ More dynamic and "alive" feeling
+ Utilizes Qwen encoder strength fully
+ Better scene composition
- Slightly longer to write

------------------------------------------------------------
🏆 WINNING PROMPT STRUCTURE – LAYERED FULLTEXT
------------------------------------------------------------

1. Opening line – Subject + Style
2. Character details – Clothing + Features
3. Action + Pose
4. Foreground + immediate environment
5. Background description
6. Composition + Lighting + Meta

------------------------------------------------------------
🚫 NEGATIVE PROMPT FINDINGS
------------------------------------------------------------

Rule:
POSITIVE → Fulltext
NEGATIVE → Short keyword tags

------------------------------------------------------------
🔤 TEXT GENERATION CAPABILITY
------------------------------------------------------------

Status after finetuning: INTACT ✅

Tested:
✔ Comic book covers with title text
✔ "BLADE ZERO" title text
✔ "ANIME MONTHLY" magazine cover
✔ Issue numbers and dates

Notes:
→ Large text works very well
→ Small text slightly blurry (base limitation)
→ Occasional spelling errors (base model behavior)

------------------------------------------------------------
⚠ KNOWN LIMITATIONS
------------------------------------------------------------

Anatomy:
→ Extra fingers / malformed hands still occur
→ Floating limbs appear occasionally
→ Manageable with negative prompts
→ Known Z-Image Base issue, not training fault

Style Consistency:
→ Base model produces anime style ~50% of the time
→ Finetuned model produces anime style consistently ✅

Details:
→ Best detail at CFG 5–6, Steps 35–40
→ ModelSamplingFlow Shift 3.0 is essential
→ Without Shift results are noticeably worse


------------------------------------------------------------
🚀 QUICK START SETTINGS
------------------------------------------------------------

Node:         ModelSamplingFlow → Shift 3.0
Sampler:      DPM++ 2M SDE  or  Euler Ancestral
Scheduler:    Simple
CFG:          5
Steps:        35
Resolution:   768x1024
Prompt style: Layered Fulltext
Negative:     Short keyword tags

此模型生成的图像