RayVietii-A
Details
Download Files
Model description
PixAI users who wish to use this model can do it by using the uploaded one on my PixAI profile from the link below :
https://pixai.art/@rayvietii/artworks/models
🌟 Based on and is my artwork My Instagram: https://www.instagram.com/ray_vietii to see the "art style"
This model is suck in shapes and hand, but it's mine. The style transfer is perfect. I'll improve it over the time. Try put noise background to the negative prompt.
Inference parameters:
Step: At least 8
CFG: 5 is recommended
Sampler: Euler, Euler a, DDIM preferred
Negative Prompt (opt): noise background
Try it out! https://pixai.art/model/1910312952549111802
🤔 A little look back.
DRm was literally my experimental subject, an introduction for me to get used to SD1.5 architecture, and SD in general. DRm is the foundation of this journey.
Papermae in other hand is supposedly the "my art style" model, but it failed.
For those who like something more technical, may this give you some insights.
This model have noise scheduler, let's call it HSC (Hard-Skip Clamping), it's a forward process only noise scheduler, my proposed HSC is a noise scheduler akin to DDPM. But HSC, instead of having 100% pure noise upon xT, it stop around 90%~, retaining 10% of the original signal. If anything, the analogy of HSC is like DDPM having an affair with Min-SNR-gamma. The quality of 4 steps is not the best, it's more of a "it can go that low and still produces coherence image" statement rather than it's producing good images, it is recommended to use 8 steps minimum for good images.
The training parameters i use:
Number of repeats: 15
Epoch: 10
UNet lr: 1.6e-4
text encoder lr: 6e-5
lr scheduler: cosine with 4 restarts
lr warmup: 0
Min-SNR-gamma: 0 (sorry T Hang et al, you've dissapointed me🤣🙏)
network dim ton alpha ratio: 1:1, a.k.a alpha = dim
=======================
Noise scheduler setting:
β start = 0.0003
β end = 0.006016
t = 800
Clip Skip = 1
=======================
Other:
dataset image count: 25
style regularization count: 4
min bucket reso: 128
max bucket reso: 4095
Clip Skip: 2
With the standard Ho's recommendation; β start=0.0001 and end=0.002, t=1000, it's reaching high noise upon xT=600~, which 400 more steps of basically just pure noise with less than 5% of original signal.
And you might ask, "then why not use t=600?", no, because it's still need to learn how to denoise, to guess a noise. You can see my β end is super specific, because it's upon xT=780, it's reached a pure noise, and the remaining 20 timesteps are the "guessing" part.
I utilize is_reg, or a style regularization to reinforce the style.
v
There's a custom parameter in my training wrapper script that utilizes subfolder. Let's call it as "style", where style=num_repeat * multiplier, these belows are
From left to right:
1. style: 025
2. style: 0.5
3. style: 1.0
4. style: 2.0
I honestly can't conclude, in term of style, all of the 4 captured it. But upon testing it, alot, using different prompts, I've concluded that 2.0 is indeed generally much better.




















