Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention

被引:4
作者
Ye, Jianglei [1 ]
Wang, Yigang [1 ]
Xie, Fengmao [2 ]
Wang, Qin [1 ]
Gu, Xiaoling [3 ]
Wu, Zizhao [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Digital Media Technol, Hangzhou, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Elect & Informat Engn, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou, Peoples R China
关键词
Virtual try-on; Diffusion models; Generative models; Slot attention; High-resolution image synthesis;
D O I
10.1007/s00371-024-03603-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.
引用
收藏
页码:3297 / 3308
页数:12
相关论文
共 54 条
[1]   Blended Latent Diffusion [J].
Avrahami, Omri ;
Fried, Ohad ;
Lischinski, Dani .
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04)
[2]  
Bińkowski M, 2021, Arxiv, DOI arXiv:1801.01401
[3]   VTNCT: an image-based virtual try-on network by combining feature with pixel transformation [J].
Chang, Yuan ;
Peng, Tao ;
Yu, Feng ;
He, Ruhan ;
Hu, Xinrong ;
Liu, Junping ;
Zhang, Zili ;
Jiang, Minghua .
VISUAL COMPUTER, 2023, 39 (07) :2583-2596
[4]   VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization [J].
Choi, Seunghwan ;
Park, Sunghyun ;
Lee, Minsoo ;
Choo, Jaegul .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14126-14135
[5]  
Cui AY, 2024, Arxiv, DOI arXiv:2311.16094
[6]  
Duchon J., 1977, Constructive Theory of Functions of Several Variables, P85, DOI 10.1007/BFb0086566
[7]  
Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]
[8]   Parser-Free Virtual Try-on via Distilling Appearance Flows [J].
Ge, Yuying ;
Song, Yibing ;
Zhang, Ruimao ;
Ge, Chongjian ;
Liu, Wei ;
Luo, Ping .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8481-8489
[9]   Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow [J].
Gou, Junhong ;
Sun, Siyu ;
Zhang, Jianfu ;
Si, Jianlou ;
Qian, Chen ;
Zhang, Liqing .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :7599-7607
[10]   Multiview High Dynamic Range Image Synthesis Using Fuzzy Broad Learning System [J].
Guo, Hongbin ;
Sheng, Bin ;
Li, Ping ;
Chen, C. L. Philip .
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (05) :2735-2747