SciStyle

v1 of SciStyle is a test model for a new image captioning pipeline I've been working on. The model was trained on a subset of 1k images of various styles/mediums. Surprised by the results for a model trained on only 1k images, I decided to release it here. The full model is currently being worked on.

For more info on the image captioning pipeline, refer to my Discord thread linked bellow

Questions/Feedback/Updates?

Visit my thread on the Unstable Diffusion Discord

Info

S&D

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 4k images out of 25k images + DnD dataset

Training Resolution: 768x768

Difference from v1: More fantasy focused, additional training on a DnD dataset.

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 1k images out of 25k images

Training Resolution: 768x768

Base Model: Stable Diffusion v1.5

Type: Experimental Fine-tune

Clip: 1

Medium: Multi-medium

Caption Style: Natural Language + Booru Style

Dataset Size: Subset, 6.5k images out of 25k images

Training Resolution: 768x768

Difference from v1: More species from various Sci-fi and fantasy universes.

Features

Multi-medium: Capable of generating images from multiple art mediums, simply include the medium in the prompt.
Natural Language & Booru: Accepts both natural language prompts and booru style prompts.
Extra Detail: Understands subtle details often skipped by SD models. Such as, number of objects/subjects in a scene, background information, color information for various parts of the image, atmosphere, ect.. (see my discord thread above for more info on how this is achieved.)
Flexible: Can easily be merged with other SD1.5 checkpoints / LoRAs

Usage

Special Tokens:

SciStyle, can be used as a class token at the beginning of the prompt, but is not necessary.
Tag for various art mediums, i.e., a comic book illustration of, 90s anime screencap of or, simply add the medium towards the end of the prompt; comic book illustration, photorealistic. These are just examples of tag placement. Feel free to experiment with other mediums

Recommended Settings

Sampler/Solver:

Euler a
- Steps: 20 - 32
- CFG: 6 - 7.5
DPM++ SDE Karras
- Steps: 30 - 40
- CFG: 6 - 8.5
DPM++ 2M SDE Karras
- Steps: 50+
- CFG: 7 - 8

These are just recommendations.

Hires Fix

4x-UltraSharp - Link
Remacri - Link

Settings for all ESRGAN models:

Upscale by
- 1.5 if resolution is > 512x768
- Don't exceed 2.0 (unless you have a beefy rig)
Denoise Strength
- 0.25 - 0.35
Hires Steps
- If sampling steps > 60,
  - hires steps = half of sampling steps
- Otherwise, leave at 0

Extensions

ADetailer
Download here

Neutral Prompt

Download here

Read repo(s) Descriptions for usage guides

Negative Embeddings

Only if you want to remake one of the sample images. Personally, I would avoid using negative embeddings and instead use a simple negative prompt and then add+ or subtract- tokens per new idea. I only use them to speed-up inference during sample generation. That being said, other negative embeddings such as EasyNegative, ect.. are also fine to use with this model.