Lab @8 : lr-gotcha - concept of getting caught when being naughty
Details
Download Files
About this version
Model description
Disclaimers
This LORA series will provide some intermediate training experimental LORAs for sharing LORA training learning.
These LoRA will be marked as *BAD*as the prefix. These will not specifically provide post-run images.
If you are interested, you can download and try to see how they perform, or refer to some benchmark content in the appendix.
Intro
This LoRA is a concept of : a girl doing naughty things and being caught by her boyfriend / husband
Switch to the right model
/model/1277670/janku-v30-noobai-eps-rouwei-nsfw-illustrious-xl
lab_8_11, lab_8_12,
- is trained on this base model, and almost all generated images are not messed up
Changelog
250427
Lab_8_12, updated ( base model is Janku v3 )
Use the fixed reg image set and re-train the LoRA
result: the generation becomes more stable
And oh, I know why the heck it was so messy at the first place ... !! ( if you are interested in training your own LoRA, you can ref to this LoRA's findings for more information about the mistakes I made : /model/1493162/lab-10-nude-pantyhose-natural-pantyhose-with-cotton-or-shiny-controlling )
Findings
Data Preprocessing: Single-line caption format is critical (as the doc said so, and the result does show the difference)
Trigger Token: "inset" tag effectively captures target concept, precise selection matters (which means that, picking better trigger token to guide training is somehow important)
Regularization: Simple, but high-quality images work best; complex images causes trouble for ai to understand
Regulariztion captions: we do need this (I have check the sd-script python script, it does processing captions of reg image set. Interestingly, I found the doc of sd-script said that we don't need captions for reg? maybe it's referring to class-training lora), auto tagging (with 0.4 threshold can do)
Dataset Quality: Curated high-quality set (28 images), it's a good point of utilizing sd itself for generating high quality images (always highres fix & adetailer fix for your dataset & reg set)
Base Model: janku v3 produces more stable outputs with fewer artifacts
General training info
NOTE: basic info can ref to lab_8_1 (almost the same)
Lab 1 - Initial Setup
Base Model: wai v13
Dataset: 5 pics (24 repeats × 8 epochs)
Trigger Tokens: lr-gotcha, split frame
Captions: One line
Regularization: None
Notes: Initial attempt
Lab 2 - Dataset Expansion with Error
Base Model: wai v13
Dataset: 8 pics (24r × 8ep), new dataset
Trigger Tokens: lr-gotcha, comic
Captions: Multi-line (incorrect format)
Regularization: None
Notes: Caption format error discovered later
Lab 3 - Foundation with Regularization
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, 1boy in frame
Captions: One line (corrected)
Regularization: lab_8_3_reg (1girl normal, 1girl gets caught)
Notes: First successful regularization implementation
Lab 4 - Trigger Change with Regression
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, one inset boy
Captions: Multi-line (regression)
Regularization: Same as Lab 3
Notes: Reverted to incorrect caption format
Lab 5 - Caption Format Correction
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, one inset boy
Captions: One line (corrected again)
Regularization: Same as Lab 3
Notes: Re-established proper caption format
Lab 6 - Key Discovery ('inset' tag)
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: lab_8_3_reg + 1boy pointing
Notes: 'inset' tag proved effective, quality improved
Lab 7 - Complex Regularization Experiment
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: Script-generated from dataset tags
Notes: Complex reg images led to poor quality
Lab 8 - Regularization Caption Removal
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: Same as Lab 7, no captions
Notes: Still suboptimal results
Lab 9 - Individual Tagging Attempt
Base Model: wai v13
Dataset: 8 pics (24r × 8ep)
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: Same as Lab 7, individually tagged
Notes: Complex programmatic reg generation unstable
Lab 10 - Breakthrough with Quality Improvements
Base Model: wai v13
Dataset: 28 pics (8r × 8ep), curated
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: Script-generated simple images (6 categories)
Notes: High-quality dataset + simple reg images = success
Lab 11 - Base Model Validation
Base Model: janku v3
Dataset: 28 pics (8r × 8ep), same as Lab 10
Trigger Tokens: lr-gotcha, inset
Captions: One line
Regularization: Same as Lab 10
Notes: janku v3 produced more stable results
Training Process Overview
Initiation: Basic Setup
Action: Initial training attempt using the
wai illustrious v13model, a small dataset (5 images), specific trigger tokens (lr-gotcha, split frame), correctly formatted single-line captions, and no regularization images.Finding: Provided a starting point but lacked the sophistication (like regularization) needed for a high-quality, controllable Lora.
Early Steps: Dataset Expansion & Critical Preprocessing Error
- Action: Increased the dataset size (8 images) and changed trigger words (
lr-gotcha, comic).
- Action: Increased the dataset size (8 images) and changed trigger words (
Foundation: Introducing Regularization and Correct Data Handling
- Action: Introduced the first set of regularization images (
lab_8_3_reg). Critically, corrected the caption format issue from Lab 2, ensuring tags were comma-separated on a single line. Changed trigger token (lr-gotcha, 1boy in frame).
- Action: Introduced the first set of regularization images (
Experiment: New Trigger, Accidental Regression
- Action: Changed the trigger token to
lr-gotcha, one inset boy. However, mistakenly reverted to using multi-line captions instead of the required single-line format. Used the same regularization as Lab 3.
- Action: Changed the trigger token to
Course Correction: Reinstating Correct Caption Formatting
Action: Corrected the mistake from Lab 4 by ensuring captions were properly formatted as a single, comma-separated line. Kept the trigger token (
lr-gotcha, one inset boy) and regularization from the previous step.Finding: Underscored the importance of adhering to technical requirements like caption formatting for effective training.
Key Discovery: The 'inset' Trigger Token
Action: Changed the trigger token to
lr-gotcha, insetbased on an observation or hypothesis. Slightly expanded the regularization set.Finding: The
insettag proved effective for capturing the desired visual concept. This discovery marked improvement in the Lora's ability to generate the intended style, demonstrating the importance of precise trigger words. And the reg set is ok? so quality saw a noticeable jump.
Experiment: Automated, Complex Regularization Image Generation
Action: Implemented a novel regularization strategy using a script. The script generated regularization images by feeding dataset image tags (excluding triggers) into the base model, keeping the original captions.
Finding: This method produced regularization images that were too visually complex, leading to instability and poor Lora performance.
Refining Complex Regularization (Caption Removal)
Action: Still using the complex regularization images from Lab 7, removed all associated text captions from them, hypothesizing that the captions might be adding noise.
Finding: captions of regularization images did is important.
Refining Complex Regularization (Individual Tagging)
- Action: Continued using the script-generated complex regularization images (from Lab 7). Attempted refinement by applying automated tagging (0.4 threshold) individually to each regularization image.
Breakthrough: Synergizing Dataset Quality and Simplified Regularization
Action: Undertook a major dataset overhaul – expanded to 28 images, carefully selected/generated for quality using a base model and manual editing (Photoshop). Simultaneously, simplified the regularization approach, using a script to create 6 categories of high-quality, simple composition images with post-processing (highres fix, adetailer).
Finding: This combined approach of high-quality, targeted training data and high-quality, relevant but simple regularization images was the key to achieving the desired Lora quality.
Validation with a Superior Base Model (
janku v3)Action: Replicated the successful configuration from Lab 10 (curated dataset, simplified regularization,
insettrigger) but utilized thejanku v3base model.Finding: The
janku v3model improved output stability and reduced image artifacts compared to the previous base model. (but i don't know why for now)
Initital step -- Lab 8_1 ( the basic training params & goal )
goal & findings
goal
This is the first training. The goal is to quickly check whether the SD model can learn the general idea.
- The answer is YES, but not as good as I expect.
Dataset source method:
First, use the base model to generate 2 pics of : 1girl surprised & 1boy pointing
Use Photoshop to synthesize images with split panes (Note: in this version, it is not processed as
inset, but the image of 1girl is split in half, and then the 1boy is placed in the middle)
findings
The chaotic layout leads to LoRA producing illogical local compositions during use.
Although other local areas (if logical) have "good quality", the overall value of LoRA at this experimental stage is very low.
This confirms me that the dataset I made is at least of a reasonable quality.
If there are only a few images, try to use tags that the base model already "understands" to guide LoRA training, rather than using a tag it doesn't understand well.
- I accidentally found that the "inset" tag is more appropriate.
In the dataset given at this stage, the image structure is top (1girl), middle (1boy), bottom (1girl). Because they are completely separated (my guess), the model cannot associate the upper and lower parts well during LoRA training, so when generating split composition images, the upper and lower parts often look like they come from two different images.
- This is also why I focus on "inset". Inset does not completely separate the main background in the overall image, allowing the model to better learn the "background" as a whole.
training params
# Model & Data
pretrained_model_name_or_path="... /wai_i13.safetensors"
# image count = 5 pics
train_data_dir="... /lora_lab_8_1_gotcha/dataset"
# reg image count = 0
reg_data_dir=""
# Training
max_train_epochs=8
train_batch_size=1
seed=1861
resolution="832,1280"
max_bucket_reso=1280
min_bucket_reso=256
bucket_reso_steps=64
bucket_no_upscale=true
enable_bucket=true
cache_latents_to_disk=true
persistent_data_loader_workers=true
shuffle_caption=true
caption_extension=".txt"
keep_tokens=2
max_token_length=225
# LoRA Network
network_module="networks.lora"
network_dim=16
network_alpha=8
# Optimizer & LR
learning_rate=0.0001
unet_lr=0.0001
text_encoder_lr=0.00001
optimizer_type="AdamW8bit"
lr_scheduler="cosine_with_restarts"
lr_warmup_steps=0
lr_scheduler_num_cycles=1
# Precision & Device
mixed_precision="bf16"
xformers=true
lowram=true
# Saving & Logging
save_model_as="safetensors"
save_precision="fp16"
save_every_n_epochs=2
log_with="tensorboard"
log_prefix="lora_lab_8_1_gotcha--"
output_name="lora_lab_8_1_gotcha"
# Loss
prior_loss_weight=1

















