SoReal! - POV

详情

模型描述

Follow me on Patreon!

SoReal! - POV

[SoReal! Portraits]

Overview

Reach your hands out for the stars! This model is the first of a series in Z-Image LORAs - aimed to bring diversity in both concepts and humanity itself to Z-Image.

Compatibility & Usage

Due to it's small size and rank, the model should have a minimal influence on the base model, further improving compatibility with other LORAs across Base/Turbo and indeed other checkpoints.

'Trigger words' aren't real - don't ask for one, just prompt normally. If you want a hand (literally), use 'a man's hand' or 'a woman's hand', which should normally get you what you want.

I'll upload a full concept list soon to show the range of concepts the model has been trained on - not confirming that the model is able to reproduce them, though.

When using Z-Image Turbo, strengths between 0.95 and 1.5 work best in my experience for V1, and 0.9 - 1.2 for V2.

Limitations

Anatomy is still rough - planning one further additional training run to try and address this for NSFW concepts but may mean a split between generalisation model (v2) and a NSFW-special model (V2-NSFW).

Future

Future iterations of this model will see stronger prompt adherence, anatomy adherence and general composition and quality through +/- reinforcement learning.

I am planning on finetuning Z-Image considerably with a model called 'SoReal!' (Or, alternatively, ZoReal!). However, I want it to be the best possible amateur finetune possible, to achieve this, I have:

  • 1. Trained a custom quality model.

  • 2. Trained a custom one-shot demographic model (height, weight, skin tone, ethnicity, age in years, body shape) with an average accuracy of 89% for top-confidence prediction using ConvNext-XL.

  • 3. Finetuned wd-tagger-large-v3 on a large sample dataset of 50k hand-tagged images with human-assisted active learning.

  • 4. Fed those tagged images (with quality, demographics and general labels) with the image metadata (incl. EXIF & Camera Metadata) to Gemini 3 Flash for generating captions.

  • No over-trained LORAs baked in, no dramatic loss of generalisation, just a good, all-round, NSFW-ready, finetuned model.

I am now severely limited, however, by my compute and financial situation, so if you'd like to help make SoReal!, well, so real, then you can follow me on Patreon!

Dataset & Training

Dataset of 2500 sourced from a variety of sources. Deduplication and Quality Scoring (through MANIQA) lowered the dataset to around 1400. This model was trained on a dataset of 1500 images at a batch size minimum of 10. This means

This model was trained on a dataset of 1500 images at a batch size minimum of 10. Masked loss was implemented after roughly 40,000 samples (not steps) to improve anatomy & concept adherence.

Validation loss was used with 10% of the dataset size to prevent overfitting while still maintaining strong concept adherence and generalisation.

Model was trained with AdamW through the Python adv-optm package.

Licensing

If you'd like to release a merge of this model, please contact me.

Made with <3 By BitcrushedHeart

此模型生成的图像