Learning to Embed Semantic Similarity for Joint Image-Text Retrieval

被引：6

作者：

Malali, Noam ^{[1
]}

Keller, Yosi ^{[1
]}

机构：

[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 12期

关键词：

Text and image fusion; deep learning; joint embedding;

D O I：

10.1109/TPAMI.2021.3132163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L-2 distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.

引用

页码：10252 / 10260

页数：9

共 42 条

[1] AKAHO S., 2006, ARXIV
[2] Andrienko G., 2013, Introduction, P1
[3] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[4] Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions
Ba, Jimmy Lei
Swersky, Kevin
Fidler, Sanja
Salakhutdinov, Ruslan
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4247 - 4255
[5] Correlational Neural Networks
Chandar, Sarath
Khapra, Mitesh M.
Larochelle, Hugo
Ravindran, Balaraman
[J]. NEURAL COMPUTATION, 2016, 28 (02) : 257 - 285
[6] Chechik G, 2010, J MACH LEARN RES, V11, P1109
[7] Beyond triplet loss: a deep quadruplet network for person re-identification
Chen, Weihua
Chen, Xiaotang
Zhang, Jianguo
Huang, Kaiqi
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1320 - 1329
[8] De Brabandere B, 2017, Arxiv, DOI [arXiv:1708.02551, 10.48550/arXiv.1708.02551]
[9] Linking Image and Text with 2-Way Nets
Eisenschtat, Aviv
Wolf, Lior
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1855 - 1865
[10] Faghri F., 2018, PROC BRIT MACH VIS C

← 1 2 3 4 5 →