DALL-E 3-like Girls

Z Image Turbo LoRA Info:

Generation:

Two-in-one simple ComfyUI workflow with dual samplers prepared for a somewhat more realistic image generation (example images) that can easily be edited to increase or decrease the DALL-E 3-like style by changing the "end_at_step" and "start_at_step" values inside the sampler nodes.

The first sampler uses the LoRA and in that workflow ends at step 6, then the second sampler begins at step 6 and doesn't use the LoRA, in order to use the regular Z Image Turbo model to finish the image which is very good at ironing out the details for a more realistic look.

If you want to increase the DALL-E 3-like look to the maximum like in most of the example images you see here, simply increase the values of "end_at_step" and "start_at_step" up to 8. Or use the res_2s/res_5s/res_6s and beta57 sampler and scheduler combo which increases the quality and likeness by a lot but is slower.

LoRA Strength in the ComfyUI frontend is best around ~0.7.

Not trained with a trigger word, but having "detailed" in the prompt is basically mandatory for a style improvement.

Recommended resolutions start at 1280x1280 / 1024x1536 1536x1024 and end at 2048x2048, but 1024x1024 / 768x1280 1280x768 and other aspect ratios are also okay.

The rest are default settings: 8 steps, cfg 1, euler-simple, shift 3.

Simple tag-based prompt example:

detailed, three girls, latina, puckered lips, sun shining on their faces, pool party, heavy makeup

More useful prompt words to also add into your prompt that were common in the dataset:

tongue out, puckered lips, smile, face close-up, laying down, on back, on stomach, ring light, asian, latina, african

For the more 3D animated movie look include both:

3d, animation

Training info:

Used ai-toolkit with this official tutorial and it's settings except without "Differential Guidance". Unquantized model and text encoder.

I did 14000 steps and picked the 12500 checkpoint as a clear winner for what I wanted although the model started giving good results much before that point during training.

~110 images, most at 1024x1024, few very simple tag-based captions used.

Old Qwen Image LoRA Info:

Generation:

No trigger word.

Very simple tag-based prompt example:

detailed, two girls, tongue out, smile, night rave

More useful prompt words to swap out into the prompt that were common in the dataset:

tongue out, puckered lips, smile, face close-up, laying down, on back, on stomach, ring light, asian, latina, african

For the more 3D animated movie look include both:

3d, animation

Strongly recommended settings:

I encourage you that for every prompt you like you try both the default LoRA that was trained with 3000 steps, and also the 3250 step version. Both seem good and different enough
Using detailed in the prompt should always be better
Generating at 1328x1328 should always be better than 1024x1024, try different resolutions
Try euler-simple/euler ancestral-simple/lcm-simple, shift 0.5 to 4
I found the settings I personally like while focusing on a fast Qwen Image workflow with the 4 step lightning LoRA, tweaking the generation settings easily shifts around the DALL-E 3-like girl style so you should find the settings for the look you like

My settings:

I actually use the Qwen Image Edit lightning LoRA which gives much more interesting results and, aside from the training dataset captions being tag-based, I think is the biggest contributor to solving the low seed variance problem of Qwen Image, albeit at a cost of usually slightly grainier image. You can also try other lightning LoRAs.

1328x1328, 4 steps, cfg 1, euler-simple, shift 2.5 (and 0.5/1/2/3.1)

I did minimal testing without lightning, 2.5 cfg 50 steps worked OK, appending the officially recommended , Ultra HD, 4K, cinematic composition. string to the 50 step workflow prompt also seems good.

Limitations:

There was slight hand body horror blur in the dataset which can bleed in.

There can also be some weird wacky clothing and tattoos concept bleed with the 3000 steps model, but I still think that one has better DALL-E 3-like styles and faces.

Concept bleed can happen depending on the prompt and settings of some liquid spilled on the body at times, particularly at 1024x1024, from some more unique images in the training dataset that I assume weren't tagged fully.

In case of these problems above happening on a seed you like, you can play around with the positive/negative prompts to mitigate the problematic concept slightly and lower the LoRA strength down until the bleed is gone.

Training info:

Used ai-toolkit with this official tutorial and it's settings, with 0.0002 learning rate and 3500 steps. 3500 steps overcooks the lora, while the 3000 and 3250 checkpoints are good.

~110 images, mostly 1024x1024, very few and very simple tag-based captions.

Post in the comments below if you find an interesting generation settings setup.

The content posted here is not affiliated with OpenAI.

模型类型	LORA
基础模型	ZImageTurbo
发布时间	2025-11-30
训练词汇	detailed

DALL-E 3-like Girls

详情

下载文件 (1)

模型描述

Z Image Turbo LoRA Info:

Generation:

Training info:

Old Qwen Image LoRA Info:

Generation:

Strongly recommended settings:

My settings:

Limitations:

Training info:

此模型生成的图像