Lab @8 : lr-gotcha - concept of getting caught when being naughty

Disclaimers

This LORA series will provide some intermediate training experimental LORAs for sharing LORA training learning.

These LoRA will be marked as *BAD*as the prefix. These will not specifically provide post-run images.

If you are interested, you can download and try to see how they perform, or refer to some benchmark content in the appendix.

Intro

This LoRA is a concept of : a girl doing naughty things and being caught by her boyfriend / husband

Switch to the right model

/model/1277670/janku-v30-noobai-eps-rouwei-nsfw-illustrious-xl
lab_8_11, lab_8_12,
- is trained on this base model, and almost all generated images are not messed up

Changelog

250427

Lab_8_12, updated ( base model is Janku v3 )
Use the fixed reg image set and re-train the LoRA
- result: the generation becomes more stable
- And oh, I know why the heck it was so messy at the first place ... !! ( if you are interested in training your own LoRA, you can ref to this LoRA's findings for more information about the mistakes I made : /model/1493162/lab-10-nude-pantyhose-natural-pantyhose-with-cotton-or-shiny-controlling )

Findings

Data Preprocessing: Single-line caption format is critical (as the doc said so, and the result does show the difference)
Trigger Token: "inset" tag effectively captures target concept, precise selection matters (which means that, picking better trigger token to guide training is somehow important)
Regularization: Simple, but high-quality images work best; complex images causes trouble for ai to understand
Regulariztion captions: we do need this (I have check the sd-script python script, it does processing captions of reg image set. Interestingly, I found the doc of sd-script said that we don't need captions for reg? maybe it's referring to class-training lora), auto tagging (with 0.4 threshold can do)
Dataset Quality: Curated high-quality set (28 images), it's a good point of utilizing sd itself for generating high quality images (always highres fix & adetailer fix for your dataset & reg set)
Base Model: janku v3 produces more stable outputs with fewer artifacts

General training info

NOTE: basic info can ref to lab_8_1 (almost the same)

Lab 1 - Initial Setup
- Base Model: wai v13
- Dataset: 5 pics (24 repeats × 8 epochs)
- Trigger Tokens: lr-gotcha, split frame
- Captions: One line
- Regularization: None
- Notes: Initial attempt
Lab 2 - Dataset Expansion with Error
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep), new dataset
- Trigger Tokens: lr-gotcha, comic
- Captions: Multi-line (incorrect format)
- Regularization: None
- Notes: Caption format error discovered later
Lab 3 - Foundation with Regularization
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, 1boy in frame
- Captions: One line (corrected)
- Regularization: lab_8_3_reg (1girl normal, 1girl gets caught)
- Notes: First successful regularization implementation
Lab 4 - Trigger Change with Regression
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, one inset boy
- Captions: Multi-line (regression)
- Regularization: Same as Lab 3
- Notes: Reverted to incorrect caption format
Lab 5 - Caption Format Correction
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, one inset boy
- Captions: One line (corrected again)
- Regularization: Same as Lab 3
- Notes: Re-established proper caption format
Lab 6 - Key Discovery ('inset' tag)
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: lab_8_3_reg + 1boy pointing
- Notes: 'inset' tag proved effective, quality improved
Lab 7 - Complex Regularization Experiment
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: Script-generated from dataset tags
- Notes: Complex reg images led to poor quality
Lab 8 - Regularization Caption Removal
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: Same as Lab 7, no captions
- Notes: Still suboptimal results
Lab 9 - Individual Tagging Attempt
- Base Model: wai v13
- Dataset: 8 pics (24r × 8ep)
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: Same as Lab 7, individually tagged
- Notes: Complex programmatic reg generation unstable
Lab 10 - Breakthrough with Quality Improvements
- Base Model: wai v13
- Dataset: 28 pics (8r × 8ep), curated
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: Script-generated simple images (6 categories)
- Notes: High-quality dataset + simple reg images = success
Lab 11 - Base Model Validation
- Base Model: janku v3
- Dataset: 28 pics (8r × 8ep), same as Lab 10
- Trigger Tokens: lr-gotcha, inset
- Captions: One line
- Regularization: Same as Lab 10
- Notes: janku v3 produced more stable results

Training Process Overview

Initiation: Basic Setup
- Action: Initial training attempt using the wai illustrious v13 model, a small dataset (5 images), specific trigger tokens (lr-gotcha, split frame), correctly formatted single-line captions, and no regularization images.
- Finding: Provided a starting point but lacked the sophistication (like regularization) needed for a high-quality, controllable Lora.
Early Steps: Dataset Expansion & Critical Preprocessing Error
- Action: Increased the dataset size (8 images) and changed trigger words (lr-gotcha, comic).
Foundation: Introducing Regularization and Correct Data Handling
- Action: Introduced the first set of regularization images (lab_8_3_reg). Critically, corrected the caption format issue from Lab 2, ensuring tags were comma-separated on a single line. Changed trigger token (lr-gotcha, 1boy in frame).
Experiment: New Trigger, Accidental Regression
- Action: Changed the trigger token to lr-gotcha, one inset boy. However, mistakenly reverted to using multi-line captions instead of the required single-line format. Used the same regularization as Lab 3.
Course Correction: Reinstating Correct Caption Formatting
- Action: Corrected the mistake from Lab 4 by ensuring captions were properly formatted as a single, comma-separated line. Kept the trigger token (lr-gotcha, one inset boy) and regularization from the previous step.
- Finding: Underscored the importance of adhering to technical requirements like caption formatting for effective training.
Key Discovery: The 'inset' Trigger Token
- Action: Changed the trigger token to lr-gotcha, inset based on an observation or hypothesis. Slightly expanded the regularization set.
- Finding: The inset tag proved effective for capturing the desired visual concept. This discovery marked improvement in the Lora's ability to generate the intended style, demonstrating the importance of precise trigger words. And the reg set is ok? so quality saw a noticeable jump.
Experiment: Automated, Complex Regularization Image Generation
- Action: Implemented a novel regularization strategy using a script. The script generated regularization images by feeding dataset image tags (excluding triggers) into the base model, keeping the original captions.
- Finding: This method produced regularization images that were too visually complex, leading to instability and poor Lora performance.
Refining Complex Regularization (Caption Removal)
- Action: Still using the complex regularization images from Lab 7, removed all associated text captions from them, hypothesizing that the captions might be adding noise.
- Finding: captions of regularization images did is important.
Refining Complex Regularization (Individual Tagging)
- Action: Continued using the script-generated complex regularization images (from Lab 7). Attempted refinement by applying automated tagging (0.4 threshold) individually to each regularization image.
Breakthrough: Synergizing Dataset Quality and Simplified Regularization
- Action: Undertook a major dataset overhaul – expanded to 28 images, carefully selected/generated for quality using a base model and manual editing (Photoshop). Simultaneously, simplified the regularization approach, using a script to create 6 categories of high-quality, simple composition images with post-processing (highres fix, adetailer).
- Finding: This combined approach of high-quality, targeted training data and high-quality, relevant but simple regularization images was the key to achieving the desired Lora quality.
Validation with a Superior Base Model (janku v3)
- Action: Replicated the successful configuration from Lab 10 (curated dataset, simplified regularization, inset trigger) but utilized the janku v3 base model.
- Finding: The janku v3 model improved output stability and reduced image artifacts compared to the previous base model. (but i don't know why for now)

Initital step -- Lab 8_1 ( the basic training params & goal )

goal & findings

goal
- This is the first training. The goal is to quickly check whether the SD model can learn the general idea.
  - The answer is YES, but not as good as I expect.
- Dataset source method:
  - First, use the base model to generate 2 pics of : 1girl surprised & 1boy pointing
  - Use Photoshop to synthesize images with split panes (Note: in this version, it is not processed as inset, but the image of 1girl is split in half, and then the 1boy is placed in the middle)
findings
- The chaotic layout leads to LoRA producing illogical local compositions during use.
  - Although other local areas (if logical) have "good quality", the overall value of LoRA at this experimental stage is very low.
  - This confirms me that the dataset I made is at least of a reasonable quality.
- If there are only a few images, try to use tags that the base model already "understands" to guide LoRA training, rather than using a tag it doesn't understand well.
  - I accidentally found that the "inset" tag is more appropriate.
- In the dataset given at this stage, the image structure is top (1girl), middle (1boy), bottom (1girl). Because they are completely separated (my guess), the model cannot associate the upper and lower parts well during LoRA training, so when generating split composition images, the upper and lower parts often look like they come from two different images.
  - This is also why I focus on "inset". Inset does not completely separate the main background in the overall image, allowing the model to better learn the "background" as a whole.

training params

# Model & Data
pretrained_model_name_or_path="... /wai_i13.safetensors"
# image count = 5 pics
train_data_dir="... /lora_lab_8_1_gotcha/dataset"
# reg image count = 0
reg_data_dir=""

# Training
max_train_epochs=8
train_batch_size=1
seed=1861
resolution="832,1280"
max_bucket_reso=1280
min_bucket_reso=256
bucket_reso_steps=64
bucket_no_upscale=true
enable_bucket=true
cache_latents_to_disk=true
persistent_data_loader_workers=true
shuffle_caption=true
caption_extension=".txt"
keep_tokens=2
max_token_length=225

# LoRA Network
network_module="networks.lora"
network_dim=16
network_alpha=8

# Optimizer & LR
learning_rate=0.0001
unet_lr=0.0001
text_encoder_lr=0.00001
optimizer_type="AdamW8bit"
lr_scheduler="cosine_with_restarts"
lr_warmup_steps=0
lr_scheduler_num_cycles=1

# Precision & Device
mixed_precision="bf16"
xformers=true
lowram=true

# Saving & Logging
save_model_as="safetensors"
save_precision="fp16"
save_every_n_epochs=2
log_with="tensorboard"
log_prefix="lora_lab_8_1_gotcha--"
output_name="lora_lab_8_1_gotcha"

# Loss
prior_loss_weight=1

Model Type	LORA
Base Model	Other
Published	4/17/2025