Genshin TCG Style [Wan 14B]

详情

模型描述

触发词:Genshin_TCG
模型:Wan 2.1 t2i 14B
所有示例均使用 1.0 LoRA 强度和 CFG=6 生成
推理使用了 Kijai 的工作流

对于提示词,我建议使用以下结构:

"Genshin_TCG 中景" + 角色描述(外貌、姿势、服装)+ 关键物品(武器/圣物)+ 背景 + 动态元素。特别注意色彩对比(深色盔甲 vs 发光装饰)和神秘氛围(星空、魔法粒子)。

如果希望添加类似 TCG 卡牌的金色边框,请在提示词末尾添加:

边框饰有金色轮廓,包含精致的图案,每个角落有星形徽章,边缘配有细腻纹饰,营造出优雅而精致的视觉效果。

Wan 1.3B 的前一版本可在此处找到:/model/1728768/genshin-tcg-style-wan-13b

训练细节

训练 14B 版本比 1.3B 版本容易得多。动作流畅,生成时几乎无伪影。训练所用数据集包含 54 个来自《原神·天才召唤师》TCG 卡牌游戏的短视频。由于我使用了 diffusion pipe 进行训练,以下仅提供 toml 配置文件。

数据集配置:

resolutions = [[514, 304]]
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7
frame_buckets = [1, 32, 33]

[[directory]]
path = "/home/user/Genshin_TCG_dataset/videos/304_514"
num_repeats = 5
resolutions = [[514, 304]]

[[directory]]
path = "/home/user/Genshin_TCG_dataset/videos/368_620"
num_repeats = 5
resolutions = [[620, 368]]

[[directory]]
path = "/home/user/Genshin_TCG_dataset/videos/492_828"
num_repeats = 5
resolutions = [[828, 492]]

训练配置:

output_dir = "/home/user/Genshin_TCG/14B"
dataset = "/home/user/config/dataset/dataset_v001.toml"

epochs = 80
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1
warmup_steps = 10
eval_every_n_epochs = 1
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
save_every_n_epochs = 1
activation_checkpointing = 'unsloth'
partition_method = "parameters"
save_dtype = "bfloat16"
caching_batch_size = 1
steps_per_print = 10
video_clip_mode = "single_beginning"
blocks_to_swap = 32

[model]
type = "wan"
ckpt_path = "/home/user/Wan2.1-T2V-14B"
dtype = "bfloat16"
transformer_dtype = "float8"
timestep_sample_method = "logit_normal"

[adapter]
type = "lora"
rank = 64
dtype = "bfloat16"

[optimizer]
type = 'AdamW8bitKahan'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.01
stabilize = false

此模型生成的图像

未找到图像。