Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

被引:254
作者
Huang, Junshi [1 ]
Feris, Rogerio [2 ]
Chen, Qiang [3 ]
Yan, Shuicheng [1 ]
机构
[1] Natl Univ Singapore, Singapore 117548, Singapore
[2] IBM TJ Watson Res Ctr, New York, NY USA
[3] IBM Res, Melbourne, Vic, Australia
来源
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年
关键词
D O I
10.1109/ICCV.2015.127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of cross-domain image retrieval, considering the following practical application: given a user photo depicting a clothing image, our goal is to retrieve the same or attribute-similar clothing items from online shopping stores. This is a challenging problem due to the large discrepancy between online shopping images, usually taken in ideal lighting/pose/background conditions, and user photos captured in uncontrolled conditions. To address this problem, we propose a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning. More specifically, DARN consists of two sub-networks, one for each domain, whose retrieval feature representations are driven by semantic attribute learning. We show that this attribute-guided learning is a key factor for retrieval accuracy improvement. In addition, to further align with the nature of the retrieval problem, we impose a triplet visual similarity constraint for learning to rank across the two subnetworks. Another contribution of our work is a large-scale dataset which makes the network learning feasible. We exploit customer review websites to crawl a large set of online shopping images and corresponding offline user photos with fine-grained clothing attributes, i.e., around 450,000 online shopping images and about 90,000 exact offline counterpart images of those online ones. All these images are collected from real-world consumer websites reflecting the diversity of the data modality, which makes this dataset unique and rare in the academic community. We extensively evaluate the retrieval performance of networks in different configurations. The top-20 retrieval accuracy is doubled when using the proposed DARN other than the current popular solution using pre-trained CNN features only (0.570 vs. 0.268).
引用
收藏
页码:1062 / 1070
页数:9
相关论文
共 45 条
  • [1] [Anonymous], 2011, ICCV
  • [2] [Anonymous], 2010, CVPR
  • [3] [Anonymous], 2014, ICLR
  • [4] [Anonymous], 2015, CVPR
  • [5] [Anonymous], 2014, CVPR
  • [6] [Anonymous], 2012, CVPR
  • [7] [Anonymous], 2012, CVPR
  • [8] [Anonymous], 2013, ICML WORKSHOP CHALLE
  • [9] [Anonymous], 2015, CVPR
  • [10] [Anonymous], 2011, CVPR