anal x-ray (NetaYume Lumina 3.5)
Details
Download Files
About this version
Model description
I've been experimenting with this model, but I'm taking a break now to focus on other stuff.
I used the Grok 4.1 inference model to tag the images, gave them a quick check and tweak, and then started the training.
I picked some images to train the "anal x-ray" concept. Turns out, the image flipping feature was really messing things up—it was causing broken limbs and making the prompt descriptions not match the pictures. Once I turned that off, the model started working and converged much faster.
It made me wonder if image augmentation is even worth doing anymore. I think having better, more structured data and accurate captions is way more important.
Since I fixed the bug late, I ran the training for a few extra steps to compensate.
I started with 5,000 steps, a 2e-4 linear learning rate (LR), and a Batch Size of 4. Later, I switched the LR to a cosine decay (from 2e-4 down to 4e-5) and doubled the batch size to 8. I stopped at 12,000 steps after watching the samples come in.
I ran the whole thing on vast.ai using the ai-toolkit and an RTX Pro 6000 Blackwell (mostly just to try out a bigger batch size).
You could try some related concepts, like...?
It is recommended to reduce the weights to avoid overfitting; the weights should be between 0.8 and 0.9, and not exceed 1.

