这不是一个稳定扩散模型。它需要自己的网页界面。请阅读下方获取更多信息。

注意：我不是该模型的作者，原始 HF 仓库链接如下。

Kandinsky 3.1 现已可用：

该模型的源代码（依据 Apache 2.0 许可证发布）可在 GitHub 上找到：https://github.com/ai-forever/Kandinsky-3
该模型可在 Hugging Face 上下载和使用：https://huggingface.co/ai-forever/Kandinsky3.1
该模型有自己专属的 "AUTOMATIC1111"：Kubin，用于本地使用：https://github.com/seruva19/kubin
您可以在 FusionBrain 网站上测试 Kandinsky 的能力：https://www.sberbank.com/promo/kandinsky/

仓库的原始描述：

license: apache-2.0

Kandinsky-3：文本到图像扩散模型

Kandinsky 3.1：

描述：

我们推出 Kandinsky 3.1，它是 Kandinsky 3.0 模型的后续版本，这是一个基于潜在扩散的大规模文本到图像生成模型，延续了 Kandinsky 文本到图像模型系列，反映了我们在提升图像生成质量和真实感方面的进展。我们对模型进行了增强，并添加了多种实用功能和模式，以赋予用户更多机会，充分释放我们新模型的强大能力。

Kandinsky Flash（Kandinsky 3.0 精炼器）

扩散模型在快速图像生成方面存在困难。为解决此问题，我们基于 对抗扩散蒸馏 方法并进行了一些修改，训练了 Kandinsky Flash 模型：我们在潜在空间中训练该模型，从而降低了内存开销，并移除了蒸馏损失，因为其对训练无影响。此外，我们将 Kandinsky Flash 模型应用于由 Kandinsky 3.0 生成的图像，以提升生成图像的视觉质量。

架构

在训练 Kandinsky Flash 时，我们使用了以下判别器架构：它是 Kandinsky 3.0 U-Net 编码器的一半，并附加了头部预测。

使用方法：

请查看 ./examples 文件夹中的示例 Jupyter 笔记本：

from kandinsky3 import get_T2I_Flash_pipeline

device_map = torch.device('cuda:0')
dtype_map = {
    'unet': torch.float32,
    'text_encoder': torch.float16,
    'movq': torch.float32,
}

t2i_pipe = get_T2I_Flash_pipeline(
    device_map, dtype_map
)

res = t2i_pipe("A cute corgi lives in a house made out of sushi.")

Kandinsky 图像修复

此外，我们发布了更新版本的图像修复模型，我们额外在目标检测数据集上对该模型进行了训练，从而实现了更稳定的对象生成。新的权重可在 ai-forever/Kandinsky3.1 获取。请参阅 使用示例。

提示词优化

提示词在文生图生成中起着至关重要的作用。因此，在 Kandinsky 3.1 中，我们决定使用语言模型来优化提示词。我们使用了 Intel 的 neural-chat-7b-v3-1，并采用以下系统提示作为 LLM：

### System: 你是一位提示词工程师。你的任务是扩展用户撰写的提示词。你需要提供最佳的英文文生图提示词。
### User:
{prompt}
### Assistant:
{模型的回复}

KandiSuperRes

有关 KandiSuperRes 的更多信息，请查看：https://github.com/ai-forever/KandiSuperRes/

Kandinsky IP-Adapter 与 Kandinsky ControlNet

为了在 Kandinsky 模型中使用图像作为条件，我们训练了 IP-Adapter 和基于 HED 的 ControlNet 模型。更多详情请查阅：https://github.com/ai-forever/kandinsky3-diffusers

Kandinsky 3.0：

描述：

Kandinsky 3.0 是一个开源的文本到图像扩散模型，基于 Kandinsky2-x 模型家族构建。与先前版本相比，Kandinsky 3.0 引入了更多数据，特别是与俄罗斯文化相关的数据，从而能够生成与俄罗斯文化相关的图像。此外，我们通过分别增大文本编码器和扩散 U-Net 模型的规模，提升了模型对文本的理解能力及视觉质量。

更多信息：训练细节与生成示例，请参阅我们的文章。英文版本将在几天内发布。

架构细节：

架构由三部分组成：

文本编码器 Flan-UL2（编码器部分）— 8.6B
潜在扩散 U-Net — 3B
MoVQ 编码器/解码器 — 267M

模型

我们发布了两个模型：

基础模型：基础文生图扩散模型。该模型在 400 个 A100 上训练了 200 万步。
图像修复模型：图像修复版本。该模型基于基础模型的最终检查点初始化，并在 300 个 A100 上训练了 25 万步。

安装

首先需创建一个 conda 环境：

conda create -n kandinsky -y python=3.8;
source activate kandinsky;
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu113/torch_stable.html;
pip install -r requirements.txt;

确切依赖项可通过 pip freeze 获取，并可在 exact_requirements.txt 中找到。

使用方法：

请查看 ./examples 文件夹中的示例 Jupyter 笔记本。

1. 文本到图像

import sys
sys.path.append('..')

import torch
from kandinsky3 import get_T2I_pipeline

device_map = torch.device('cuda:0')
dtype_map = {
    'unet': torch.float32,
    'text_encoder': torch.float16,
    'movq': torch.float32,
}

t2i_pipe = get_T2I_pipeline(
    device_map, dtype_map,
)
res = t2i_pipe("A cute corgi lives in a house made out of sushi.")

res[0]

2. 图像修复

from kandinsky3 import get_inpainting_pipeline

device_map = torch.device('cuda:0')
dtype_map = {
    'unet': torch.float16,
    'text_encoder': torch.float16,
    'movq': torch.float32,
}

pipe = get_inpainting_pipeline(
    device_map, dtype_map,
)

image = ... # PIL 图像
mask = ... # Numpy 数组（HxW）。在需要遮蔽的位置设为 1
image = inp_pipe("A cute corgi lives in a house made out of sushi.", image, mask)

作者

Vladimir Arkhipkin：GitHub
Anastasia Maltseva：GitHub
Andrei Filatov：GitHub
Igor Pavlov：GitHub
Julia Agafonova
Arseniy Shakhmatov：GitHub，博客
Andrey Kuznetsov：GitHub，博客
Denis Dimitrov：GitHub，博客

引用

@misc{arkhipkin2023kandinsky,
      title={Kandinsky 3.0 技术报告}, 
      author={Vladimir Arkhipkin and Andrei Filatov and Viacheslav Vasilev and Anastasia Maltseva and Said Azizov and Igor Pavlov and Julia Agafonova and Andrey Kuznetsov and Denis Dimitrov},
      year={2023},
      eprint={2312.03511},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

模型类型	检查点
基础模型	Other
发布时间	6/24/2024

Kandinsky 3.1

详情

下载文件

模型描述

这不是一个稳定扩散模型。它需要自己的网页界面。请阅读下方获取更多信息。

仓库的原始描述：

Kandinsky-3：文本到图像扩散模型

Kandinsky 3.1：

描述：

Kandinsky Flash（Kandinsky 3.0 精炼器）

架构

使用方法：

Kandinsky 图像修复

提示词优化

KandiSuperRes

Kandinsky IP-Adapter 与 Kandinsky ControlNet

Kandinsky 3.0：

描述：

架构细节：

模型

安装

使用方法：

1. 文本到图像

2. 图像修复

作者

引用

此模型生成的图像