Leveraging Style and Content features for Text Conditioned Image Retrieval

被引:4
作者
Chawla, Pranit [1 ]
Jandial, Surgan [2 ]
Badjatiya, Pinkesh [3 ]
Chopra, Ayush [4 ]
Sarkar, Mausoom [3 ]
Krishnamurthy, Balaji [3 ]
机构
[1] IIT Kharagpur, Kharagpur, W Bengal, India
[2] IIT Hyderabad, Kandi, Telangana, India
[3] Adobe, Media & Data Sci Res Lab, San Jose, CA USA
[4] MIT, Cambridge, MA 02139 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年
关键词
D O I
10.1109/CVPRW53098.2021.00448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image Search is a fundamental task playing a significant role in the success of wide variety of frameworks and applications. However, with the increasing sizes of product catalogues and the number of attributes per product, it has become difficult for users to express their needs effectively. Therefore, we focus on the problem of Image Retrieval with Text Feedback, which involves retrieving modified images according to the natural language feedback provided by users. In this work, we hypothesise that since an image can be delineated by its content and style features, modifications to the image can also take place in the two sub spaces respectively. Hence, we decompose an input image into its corresponding style and content features, apply modification of the text feedback individually in both the style and content spaces and finally fuse them for retrieval. Our experiments show that our approach outperforms a recent state of the art method in this task, TIRG, that seeks to use a single vector in contrast to leveraging the modification via text over style and content spaces separately.
引用
收藏
页码:3973 / 3977
页数:5
相关论文
共 12 条
[1]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[2]  
Gatys L.A., 2015, 16th Annual Meeting of the Vision Sciences Society (VSS 2016), page
[3]   Deep Image Retrieval: Learning Global Representations for Image Search [J].
Gordo, Albert ;
Almazan, Jon ;
Revaud, Jerome ;
Larlus, Diane .
COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 :241-257
[4]  
Guo X., 2019, ARXIV PREPRINT ARXIV
[5]   VITON: An Image-based Virtual Try-on Network [J].
Han, Xintong ;
Wu, Zuxuan ;
Wu, Zhe ;
Yu, Ruichi ;
Davis, Larry S. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7543-7552
[6]  
Jandial Surgan, 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Proceedings, P2171, DOI 10.1109/WACV45572.2020.9093458
[7]  
Johnson J, 2015, PROC CVPR IEEE, P3668, DOI 10.1109/CVPR.2015.7298990
[8]   Spatial-Semantic Image Search by Visual Feature Synthesis [J].
Mai, Long ;
Jin, Hailin ;
Lin, Zhe ;
Fang, Chen ;
Brandt, Jonathan ;
Liu, Feng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1121-1130
[9]   Large-Scale Image Retrieval with Attentive Deep Local Features [J].
Noh, Hyeonwoo ;
Araujo, Andre ;
Sim, Jack ;
Weyand, Tobias ;
Han, Bohyung .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3476-3485
[10]   Adversarial Representation Learning for Text-to-Image Matching [J].
Sarafianos, Nikolaos ;
Xu, Xiang ;
Kakadiaris, Ioannis A. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5813-5823