Improving Diffusion Models for Authentic Virtual Try-on in the Wild

被引:2
作者
Choi, Yisol [1 ]
Kwak, Sangkyung [1 ]
Lee, Kyungmin [1 ]
Choi, Hyungwon [2 ]
Shin, Jinwoo [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea
[2] OMNIOUS AI, Seoul, South Korea
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
关键词
Virtual Try-On; Diffusion Models;
D O I
10.1007/978-3-031-73016-0_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page.
引用
收藏
页码:206 / 235
页数:30
相关论文
共 61 条
[1]   SpaText: Spatio-Textual Representation for Controllable Image Generation [J].
Avrahami, Omri ;
Hayes, Thomas ;
Gafni, Oran ;
Gupta, Sonal ;
Taigman, Yaniv ;
Parikh, Devi ;
Lischinski, Dani ;
Fried, Ohad ;
Yin, Xi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :18370-18380
[2]  
Chari P, 2023, Arxiv, DOI arXiv:2312.17234
[3]   VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization [J].
Choi, Seunghwan ;
Park, Sunghyun ;
Lee, Minsoo ;
Choo, Jaegul .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14126-14135
[4]  
Cui AY, 2024, Arxiv, DOI [arXiv:2311.16094, 10.48550/arXiv.2311.16094, DOI 10.48550/ARXIV.2311.16094]
[5]  
Diffusers Team, 2023, Stable diffusion XL inpainting
[6]  
Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]
[7]   Disentangled Cycle Consistency for Highly-realistic Virtual Try-On [J].
Ge, Chongjian ;
Song, Yibing ;
Ge, Yuying ;
Yang, Han ;
Liu, Wei ;
Luo, Ping .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16923-16932
[8]   Parser-Free Virtual Try-on via Distilling Appearance Flows [J].
Ge, Yuying ;
Song, Yibing ;
Zhang, Ruimao ;
Ge, Chongjian ;
Liu, Wei ;
Luo, Ping .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8481-8489
[9]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[10]   Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow [J].
Gou, Junhong ;
Sun, Siyu ;
Zhang, Jianfu ;
Si, Jianlou ;
Qian, Chen ;
Zhang, Liqing .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :7599-7607