Wan 2.2 Pose Control

세부 정보

파일 다운로드 (1)

모델 설명

Workflow that lets you pose characters using First-Frame-Last-Frame with target pose as the last frame.

For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: Wan2.2 I2V Video. Character consistency is something you can expect from a video model, right?

After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image_1 into pose from image_2".

So, our task sounds like this:
"Take this character on the left and make her copy the pose on the right"

here are two ways to do this using local open-weight models:

  1. Flux.2 Klein character replacement workflow

  2. Wan 2.2 Pose Control workflow (this is what this post is about)

And this is what the result looks like for each method:

Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.

The idea is simple: ask Wan to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:

  1. The subject is just standing there

  2. The subject moves copying pose of pose reference

  3. The subject character morphs into character from the pose reference

  4. The character from pose reference is in the frame

Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.

How to write structured prompt

Here's two prompts that were used in the example video above:

Silver hair woman

0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background

Black beard man

0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet

Subject description is repeated so we can extract it using Apply Text Template from comfy-mtb extension.

We can extract subject description and get this template:

Silver hair woman

0s: {var_1} is standing
1s: {var_1} turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background

Black beard man

0s: {var_1} is standing at white background
1s: {var_1} sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: his body transforms into another character short orange dress, orange top hat, brown hair and fishnet

Let's examine 4 parts of this prompt.

0s - Initial description

This is where you describe your first frame. For the most part, 'is standing' is enough but you can also specify initial pose of your subject.

1s - Actual posing

This is where you specify the movements the subject must take to get from initial pose to target pose. Simple movements (turns left, sits down, crouches, raises hand) separated by comma, works the best. Also you can add 'Camera follows his movement' if your target pose requires different camera angle.

2s - Pause before scene transition

Always the same he/she keeps his pose frozen in place. Scene transitions into another scene. This part "Scene transitions into another scene" is the most important here - Wan 2.2 respects this boundary (surprisingly).

3s - Anchoring your last frame

Goes like this: body transforms into another character <description of the character on the last frame>. We want Wan 2.2 to understand that character from the start of the video is different from character at the end of the video.

Practical example

Let's practice what we've learned. Here's our subject and the pose images:

Start with the subject description. Nothing fancy here:

Next step is to describe movements:

And lastly write the transition to the last frame

Unfortunately it fails:

Wan 2.2 has managed to capture the gun's position but not the pose. The main reason here is that the black clothes in our target image don't let the model "process" the pose. Luckily we can fix it in Flux.2:

remove hair, remove clothes and draw this person bald and in skin tone underwear. Turn into white wireframe figure

Run Pose Control workflow again with updated prompt:

This time result is much better:

Some tips:

  • The whole process works the best if there's noticeable contrast between first frame and last frame: different hair color, skin color, background, etc. You can even pre-process your pose reference with some other model - turn it into wireframe figure mannequin - so Wan has a better chance of reading the pose.

  • If some elements of character design change (gloves tend to disappear too early) add them to subject description prompt so model will remember this design element.

  • If your subject image and pose reference image have different sizes try adding "Camera zooms in capturing new view" or "Camera zooms out capturing new view".

이 모델로 만든 이미지