Simulacrum V3-V38 [F1D/F1DD/F1D2] [SFW/NSFW]
Details
Download Files
About this version
Model description
Prompting Safe Version:
steps: 50
cfg 1, distilled cfg 3.5-5
euler < simple/normal
Use the rule of 3's to prompt plain english, and then tag booru tags anywhere you deem fit. They don't behave like natural language.
an apple on a table in a room, one, two, three.
an apple on a table in a room. a room in a house in a city. a city in a state in a country.
Stick with that, and you should be fine for natural language.
example prompt 1;
safe, anime,
a girl sitting on a giant apple in a room

safe, anime,
a sticker of a sitting girl on the side of a giant apple in a room
---
safe, anime,
a sticker of a frog costume wearing girl stuck to the side of a 3d apple on top of a table, apple sticker, purple hair, text in sticker "wibbit!"

an apple being invaded by aliens on a table in a room. a room in a house in a city. a city in a state in a country.

(an apple being invaded by aliens:1.2) on a table in a room. (a room made of jello:1.3) in a wool house in a city. (a wool city:1.3) in a state in a country.

safe,
a sticker of a frog costume wearing girl stuck to the side of a 3d apple on top of a table, apple sticker, purple hair, text in sticker "wibbit!",
(an apple being invaded by aliens:1.2) on a table in a room. (a room made of jello:1.3) in a wool house in a city. (a wool city:1.3) in a state in a country.

As you can see the rule of 3 applies until about the 5th degree of separation, then it starts to break down and bleed together. This is standard AI.
Don't get too attached to the NSFW version~!:
I'm beginning a full NSFW retrain using 50,000 high quality, high fidelity, realistic, 3d, and anime images; roughly 5k each category. These two packs are like oil and water, and I need them both mixable without a catalyst.
Starting for the final stretch to version 4, all future trains will all be tagged specifically using a new tagging style in association with the old style, which includes offset detection for individual tagging.
upper-left, upper-center, upper-right,
middle-left, middle-center, middle-right
lower-left, lower-middle, lower-right
- These tags were specifically chosen for their avoidance of standard booru concepts, as well as having overlap within the T5 for offset association within the scene.
size tags
full-frame
moderate
minimal
- These three tags will be used in conjunction with offset tags to ensure solidity within the images; which have some with booru tags to intentionally bleed into current training.
aesthetic tags
disgusting < 5%
very displeasing < 20%
displeasing < 35%
< 50%
aesthetic < 65%
very aesthetic < 85%
pruning
monochrome
greyscale
invalid images
removed tags
"tagme", "bad pixiv id", "bad source", "bad id", "bad tag", "bad translation", "untranslated*", "translation*", "larger resolution available", "source request", "*commentary*", "video", "animated", "animated gif", "animated webm", "protected link", "paid reward available", "audible music", "sound", "60+fps", "artist request", "collaboration request", "original", "girl on top", "boy on top",- These are not useful tags. My tagging system has wildcard capability for tag removal and utility for inclusion or removal from the tagging.
Template:
"{rating}", "{core}", "{artist}", "{characters}", "{character_count}", "{gender}", "{species}", "{series}", "{photograph}", "{substitute}", "{general}", "{unknown}", "{metadata}", "{aesthetic}"This template is based on a complete compounded list of tags from;
- safebooru, gelbooru, danbooru, e621, rule34xxx, rule34paheal, rule34us
All nonmatching aliases are normalized into a singular tag.
Any tags not present in these lists are moved automatically to unknown.
All captions are automatically placed on top of all of these tags.
I'll be using the SafeFixers Epoch40 as a base for this new train.
Safe Fixers has shown superior context awareness and control of the system, slow cooked for nearly 2 weeks on 2 4090s, and is more akin to a true progression to the base of flux in the desired direction
The the sex pack has shown the opposite. High destructive capability, bad mixing, bad lora association, and lesser context control. Since the desire is to keep context, the outcome has to shift towards the foundational "SAFE" direction, which means I'll be finetuning the safe version with the NSFW images going forward.
Key differences;
Sex pack was trained to epoch 5 using A100s in a quick cook with 15000 averaged source images, review shows the quality of the images is very hit or miss. There's monochrome that snuck in, greyscale, line drawings, some actual AI poison, long comics, and some defective images that took a while to completely pick out.
Safe fixers was trained to epoch 40 with 15000 high fidelity high score human made (mostly) anime images. The quality has shown superior context awareness and control, which can't be understated when mixing concepts.
Even at epoch 5 the sex pack was already too destructive to continue training, while the safe fixers stayed strong until epoch 40.
Learnings:
Since I worked with these two packs, I've learned a highly important element;
Image sizes cannot always be reliably bucketed on every device.
I've made a software to resize
Prune images that are too tall or too wide
corruption, validity, and sanity checks for sussing out image bombs and corrupt downloads that would otherwise go unnoticed until the training program hits the point of $100 sunk cost.
Tag order is crucial. The system itself understands tags better if they are formatted in a specific order to create a specific scene.
I've customized my in-house tagging software to ensure the tag order fits within a specific paradigm from this point forward.
I've begun tagging everything with aesthetic and quality.
very aesthetic - 0.9^
aesthetic - 0.6 ^
displeasing
very displeasing
Automated NSFW detections.
The first 10,000 image best quality finetune coming - LR 0.000033:
This marks the beginning of version 4's direct core training, meaning the 0.000033 will become the standard for version 4 until version 5. I will be using one third of current the TE learn rate for CLIP_L 0.000000333
I have been given a very high quality pack of mostly AI generated images with some of the best baseline quality I've seen in a pack of images.
As it stands, there is a large amount of information that CAN be shored up, and I can see it when I see nearly empty characters, monochrome, greyscale, and so on pop up when the tokens hit a certain point.
Since the repairs, anatomy fixers, and inclusions are so robustly solid, it's time to shore up the poses and core model to the required professional grade quality as I had planned introduced when the model began, but flux put up a real fight on that front so it took a while to get past the context point I wanted to be.
This version has hit the majority of desired context markers, so the lora stack has been merged into one entity as of today; the mixed version.
The robustness of this model is quite high, since it's survived the equivalent of grade school in education. It's time to hit the real books though, and send it off to high school where it learns the big boy associations, the big boy numbers.
Three model mic drop - 11/2/2024 9:54 am gmt -7:
Three new models are available;
Each model is intended primarily for Flux1D and not Flux1D-DeDistilled. The outcomes from DeDistilled were looking pretty bad after I hit a certain point of divergence. The primary model has returned to Flux1D until the core derives so much that Flux1D will actually hurt it, in which case we will have to rename it.
Do not train these. I haven't decided on a consistent core yet, so just play with them for now.
safe - heavily trained safe tag
lots more showcase images here: https://civitai.com/articles/8401/simulacrum-v38-safe-e30-teaser-2-electric-boogaloo
the heavily trained safe pack is in full swing here, at 80% power with the sex pack omitted.
I'm very pleased with how this turned out. It's been cooking for a week.
the entire safe pack is based on art styles, artists, and a multitude of expected and fun stuff to play with, not meant for sex what-so-ever.
you can still pose characters, control characters, move them, and so on.
ideal for making art, comics, newspaper clippings, inpainting, and more.
explicit - heavily trained explicit tag
included part of the safe pack at very low strength, entirely dedicated to sexual poses and sex acts. the goal here was to introduce the core elements of sex, which likely didn't go over well, so a retrain will be in order.
however, currently it should be fun to play with.
If you pay close attention to the images I generated, you'll see the similarities to the Similacrum core itself, which is a telltale sign that it still exists, it's still powerful, and it's still improving the base poses and model itself even now.
This is the mark of a steadfast core.
mix
- The two mixed together at high power and merged. Uncertain outcomes expected. Lots of fun to be had.
Feeding Simulacrum the Kama Sutra - 10/28/2024 7pm gmt - 7:
I've begun feeding the first wave of 15k sex pose images into the model, which contains a series of poses from many angles and from a multitude of archetypes, characters, and so on.
The tags are a mixture of danbooru, gelbooru, rule34xxx and rule34us. I've normalized many, but I decided against normalizing a lot of the more obscure tags that rule34us and rule34xxx bring to the the table. I think it'll be more fun being able to proc more things this way.
This is stage 1 of 5 for sex pack 2, the first stage being education, the second stage being filling, third stage being fixing and finetuning, fourth stage being public release testing, and the final stage being full integration into the core model.
Sex pack 1 was primarily doggystyle (at about power 0.7, soon to be cranked to full power with the counterpart) and is built directly into Simv3, I'm sure plenty of you have noticed. There's going to be a super nsfw heavy version releasing soon, and a safe version releasing due to the training of the "safe" pack on the side.
I released it secretly because it was only one pose of 35, but it was also a successful test that needed expanding.

The "Safe" version is a fully divergent model with a more attentive guidance towards art and stylizing, meant more for sfw body integrations than it's counterpart, but will still support the same elements and pieces as it's boldly NSFW counterpart.
The current version is a hybrid, which will end up the core model for the future versions, which will eventually include all the sex poses with safe refinement for the counterpart.
The sourced data training for the furry section is going to begin soon. I've sourced about 200 species so far, each of which has over 1000 images. The tags are are quite different, so there will be little risk of cross contamination when I ensure certain tags don't cross contaminate, while still giving it power to proc the 1girl, 1boy, 2girls, etc tags. Proper tag formula is integral to making it function properly.
Building the furry core is a bit daunting, since I don't really know much about the tags in general, but I'm up for the challenge and a quick study.
Contest is almost over, get your final entries in.
I've posted a 50k buzz bounty for someone to make me a mascot.
https://civitai.com/bounties/5177
Get to it if you want that buzz.
The primary image set is ANIME<<<
Dev1 = Very good context + faster
Dev1Distilled = Insanely high context + increased quality
It just translates really well to other things. The fact that the core realism model is still intact with the added training, shows proof of concept with the training method. For more information on how to make your own, consult one of my 50 scattered guides.
I recently mapped all the most important key information at the top of the feed.
As you can see with the 1dev nearly identical generates with the same prompts and seeds, the outcomes were substantially less context controlled than with DeDistilled. The generations are much faster though, so take your trade-off.
Released the Flux1D compacted as well tonight. I don't have the energy to do much more. Also sorry about the lack of a LORA as of right this minute. I'm having some problems figuring out a good way to merge the beast of a 64 dim 128 alpha model, with the lesser 32 dim 64 alpha, and 16 dim 32 alpha models that I combined to make this. I'll keep you all informed in my research progress for a safe merge.
Currently running a 10,000 image safe training run that will take over a week to complete on the 2 4090s I'm renting. It's gotten to the point where I may as well just buy a handful of them for my own use.
I regenned a few of the lower quality real images for DeDistilled below. Run it on upscale fix at a lower cfg to generate similar quality. It's not every time, but it's definitely holding context well.
Run the DeBlurr lora. It removes the "washed out" effect that people keep identifying as washed out, but in reality it's just the flux depth of field being interfered with by the T5. I call it the fixation kerfuffle.
Train loras on the UNET + CLIP_L, get T5 from wherever.
You remember that old Consistency Version 3? This is how it was supposed to look. Enjoy everyone, enjoy. Generation settings below.
The NSFW controllers are working quite well now. The next batch is 10,000 safebooru images to solidify the use of the tag "safe", but for now the system just kind of... puts a safe on the screen.
You can usually prompt NSFW elements directly, or by using the "explicit" tag to force them to pop.
There are three primary trainings; realistic, anime, and 3d. You can put them anywhere in the prompt at any time and force one of the things to become that if you want, or bleed it to the whole image. It was trained with over 35000 images so far, with over 850,000 steps. I kinda did a head count.
This is a stack of four unique loras trained with Flux1D2 directly trained on the Simulacrum Flux1D2 V23 merge;
I posted an article on how to make this this morning. They were all trained to function directly with DeDistilled inference. The outcome... makes anything that I've imagined. It needs a little elbow grease for a few things, but most of it works. Build the world, break the world, ass, titties made out of burgers, dicks made out of cheese, nobody cares. It's your world, you build it.
Settings:
These are unique to this DeDistilled model;
If you see OVERLAPPING LAYERS, increase the steps. There are contradictory trainings in both systems, so expect strange quirks until the 2 million image finetune is done.
DeDistilled Settings:
For realistic:
steps 35-50
cfg 6.5-9
For anime:
steps 20-30
cfg 4-7
flux cfg 0 << can shift to about 1.5 before massive degradation.
euler < simple/normal are my favorites. there may be other untested ones now after all the training.
<<< DOES NOT MATTER. Make any size. The bucketing training saw to that. 256 to 2048 with 35000 images sourced from everywhere. If it gets too big it starts making more than one image, too small probably won't generate anything.
1D Base Settings:
For realistic:
steps 20-40
cfg 1
flux cfg 3-5 (3.5 choice)
Same for anime, can do less or more up to you.
use euler < simple/normal
<<< DOES NOT MATTER. Make any size. The bucketing training saw to that. 256 to 2048 with 35000 images sourced from everywhere. If it gets too big it starts making more than one image, too small probably won't generate anything.
Experiments show good results upscaled from 768x768 to 1024x1024 at 25 steps with euler simple/normal on either model.
Additional generation tips including the list of tags are at this link. The lora combination will be uploaded soon as both individual loras and a recipe to compact them into this exact model.


















