|
| 1 | +# Kandinsky 2.2 |
| 2 | + |
| 3 | +[Kandinsky 2.2](https://habr.com/ru/companies/sberbank/articles/747446/) |
| 4 | + |
| 5 | +## Abstract |
| 6 | + |
| 7 | +Kandinsky 2.2 brings substantial improvements upon its predecessor, Kandinsky 2.1, by introducing a new, more powerful image encoder - CLIP-ViT-G and the ControlNet support. The switch to CLIP-ViT-G as the image encoder significantly increases the model’s capability to generate more aesthetic pictures and better understand text, thus enhancing the model’s overall performance. The addition of the ControlNet mechanism allows the model to effectively control the process of generating images. This leads to more accurate and visually appealing outputs and opens new possibilities for text-guided image manipulation. |
| 8 | + |
| 9 | +<div align=center> |
| 10 | +<img src="https://github.com/okotaku/diffengine/assets/24734142/b07d82fb-4c2c-4216-a4b1-a64b278cee2a"/> |
| 11 | +</div> |
| 12 | + |
| 13 | +## Citation |
| 14 | + |
| 15 | +``` |
| 16 | +``` |
| 17 | + |
| 18 | +## Run Training |
| 19 | + |
| 20 | +Run Training |
| 21 | + |
| 22 | +``` |
| 23 | +# single gpu |
| 24 | +$ mim train diffengine ${CONFIG_FILE} |
| 25 | +# multi gpus |
| 26 | +$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch |
| 27 | +
|
| 28 | +# Example. |
| 29 | +$ mim train diffengine configs/kandinsky_v22/kandinsky_v22_prior_pokemon_blip.py |
| 30 | +``` |
| 31 | + |
| 32 | +## Inference prior with diffusers |
| 33 | + |
| 34 | +Once you have trained a model, specify the path to the saved model and utilize it for inference using the `diffusers.pipeline` module. |
| 35 | + |
| 36 | +```py |
| 37 | +import torch |
| 38 | +from diffusers import AutoPipelineForText2Image, PriorTransformer |
| 39 | + |
| 40 | +prompt = 'yoda pokemon' |
| 41 | +checkpoint = 'work_dirs/kandinsky_v22_prior_pokemon_blip/step10450' |
| 42 | + |
| 43 | +prior = PriorTransformer.from_pretrained( |
| 44 | + checkpoint, subfolder="prior", |
| 45 | +) |
| 46 | +pipe = AutoPipelineForText2Image.from_pretrained( |
| 47 | + "kandinsky-community/kandinsky-2-2-decoder", |
| 48 | + prior_prior=prior, |
| 49 | + torch_dtype=torch.float32, |
| 50 | +) |
| 51 | +pipe.to('cuda') |
| 52 | + |
| 53 | +image = pipe( |
| 54 | + prompt, |
| 55 | + num_inference_steps=50, |
| 56 | + width=512, |
| 57 | + height=512, |
| 58 | +).images[0] |
| 59 | +image.save('demo.png') |
| 60 | +``` |
| 61 | + |
| 62 | +You can see more details on [`docs/source/run_guides/run_kandinsky_v22.md`](../../docs/source/run_guides/run_kandinsky_v22.md#inference-with-diffusers). |
| 63 | + |
| 64 | +## Results Example |
| 65 | + |
| 66 | +#### kandinsky_v22_prior_pokemon_blip |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +#### kandinsky_v22_decoder_pokemon_blip |
| 71 | + |
| 72 | + |
0 commit comments