Improving Diffusion Models for Authentic Virtual Try-on in the Wild

被引：2

作者：

Choi, Yisol ^{[1
]}

Kwak, Sangkyung ^{[1
]}

Lee, Kyungmin ^{[1
]}

Choi, Hyungwon ^{[2
]}

Shin, Jinwoo ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea

[2] OMNIOUS AI, Seoul, South Korea

来源：

COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷

关键词：

Virtual Try-On; Diffusion Models;

D O I：

10.1007/978-3-031-73016-0_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page.

引用

页码：206 / 235

页数：30

共 61 条

[1] SpaText: Spatio-Textual Representation for Controllable Image Generation [J].

Avrahami, Omri ;

Hayes, Thomas ;

Gafni, Oran ;

Gupta, Sonal ;

Taigman, Yaniv ;

Parikh, Devi ;

Lischinski, Dani ;

Fried, Ohad ;

Yin, Xi .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :18370-18380

[2]

Chari P, 2023, Arxiv, DOI arXiv:2312.17234

[3] VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization [J].

Choi, Seunghwan ;

Park, Sunghyun ;

Lee, Minsoo ;

Choo, Jaegul .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14126-14135

[4]

Cui AY, 2024, Arxiv, DOI [arXiv:2311.16094, 10.48550/arXiv.2311.16094, DOI 10.48550/ARXIV.2311.16094]

[5]

Diffusers Team, 2023, Stable diffusion XL inpainting

[6]

Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]

[7] Disentangled Cycle Consistency for Highly-realistic Virtual Try-On [J].

Ge, Chongjian ;

Song, Yibing ;

Ge, Yuying ;

Yang, Han ;

Liu, Wei ;

Luo, Ping .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16923-16932

[8] Parser-Free Virtual Try-on via Distilling Appearance Flows [J].

Ge, Yuying ;

Song, Yibing ;

Zhang, Ruimao ;

Ge, Chongjian ;

Liu, Wei ;

Luo, Ping .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8481-8489

[9] Generative Adversarial Networks [J].

Goodfellow, Ian ;

Pouget-Abadie, Jean ;

Mirza, Mehdi ;

Xu, Bing ;

Warde-Farley, David ;

Ozair, Sherjil ;

Courville, Aaron ;

Bengio, Yoshua .

COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144

[10] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow [J].

Gou, Junhong ;

Sun, Siyu ;

Zhang, Jianfu ;

Si, Jianlou ;

Qian, Chen ;

Zhang, Liqing .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :7599-7607

← 1 2 3 4 5 6 7 →