Localized Triplet Loss for Fine-grained Fashion Image Retrieval

被引：14

作者：

D'Innocente, Antonio ^{[1
,2
]}

Garg, Nikhil ^{[2
]}

Zhang, Yuan ^{[3
]}

Bazzani, Loris ^{[2
]}

Donoser, Michael ^{[2
]}

机构：

[1] Sapienza Univ Rome, Rome, Italy

[2] Amazon, Munich, Germany

[3] Amazon, Seattle, WA USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年

关键词：

D O I：

10.1109/CVPRW53098.2021.00435

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fashion retrieval methods aim at learning a clothing-specific embedding space where images are ranked based on their global visual similarity with a given query. However, global embeddings struggle to capture localized fine-grained similarities between images, because of aggregation operations. Our work deals with this problem by learning localized representations for fashion retrieval based on local interest points of prominent visual features specified by a user. We introduce a localized triplet loss function that compares samples based on corresponding patterns. We incorporate random local perturbation on the interest point as a key regularization technique to enforce local invariance of visual representations. Due to the absence of existing fashion datasets to train on localized representations, we introduce FashionLocalTriplets, a new high-quality dataset annotated by fashion specialists that contains triplets of women's dresses and interest points. The proposed model outperforms state-of-the-art global representations on FashionLocalTriplets.

引用

页码：3905 / 3910

页数：6

共 32 条

[1] Multi-Task CNN Model for Attribute Prediction [J].

Abdulnabi, Abrar H. ;

Wang, Gang ;

Lu, Jiwen ;

Jia, Kui .

IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :1949-1959

[2] Learning Attribute Representations with Localization for Flexible Fashion Search [J].

Ak, Kenan E. ;

Kassim, Ashraf A. ;

Lim, Joo Hwee ;

Tham, Jo Yew .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7708-7717

[3]

Berg TL, 2010, LECT NOTES COMPUT SC, V6311, P663, DOI 10.1007/978-3-642-15549-9_48

[4]

Bossard L., 2013, AS C COMP VIS, V7727, P321, DOI [10.1007/978-3-642-37447-025, DOI 10.1007/978-3-642-37447-025]

[5] Describing Clothing by Semantic Attributes [J].

Chen, Huizhong ;

Gallagher, Andrew ;

Girod, Bernd .

COMPUTER VISION - ECCV 2012, PT III, 2012, 7574 :609-623

[6]

Chen Q, 2015, PROC CVPR IEEE, P5315, DOI 10.1109/CVPR.2015.7299169

[7] Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction [J].

Corbiere, Charles ;

Ben-Younes, Hedi ;

Rame, Alexandre ;

Ollion, Charles .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2268-2274

[8] Attention-driven salient edge(s) and region(s) extraction with application to CBIR [J].

Feng, Songhe ;

Xu, De ;

Yang, Xu .

SIGNAL PROCESSING, 2010, 90 (01) :1-15

[9]

Ge Yuying, 2019, CVPR

[10]

Hadsell R, 2006, 2006 IEEE COMPUTER S, P1735

← 1 2 3 4 →