INTRA-MODAL CONSTRAINT LOSS FOR IMAGE-TEXT RETRIEVAL

被引:8
作者
Chen, Jianan [1 ]
Zhang, Lu [1 ]
Wang, Qiong [1 ]
Bai, Cong [2 ]
Kpalma, Kidiyo [1 ]
机构
[1] Univ Rennes, INSA Rennes, CNRS, IETR UMR 6164, F-35000 Rennes, France
[2] Zhejiang Univ Technol, Coll Comp Sci, Hangzhou, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2022年
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; image-text retrieval; intra-modal constraint; positive pairs; similarity distance;
D O I
10.1109/ICIP46576.2022.9897195
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available(1).
引用
收藏
页码:4023 / 4027
页数:5
相关论文
共 23 条
[1]   Cardia Laxity under Retroflexed Endoscopy Is a Reflection of Esophageal Hiatus Enlargement [J].
Chen, Dong ;
Tian, Shurui ;
Hu, Zhiwei ;
Wu, Jimin .
GASTROENTEROLOGY RESEARCH AND PRACTICE, 2020, 2020
[2]   Cross-Modal Knowledge Adaptation for Language-Based Person Search [J].
Chen, Yucheng ;
Huang, Rui ;
Chang, Hong ;
Tan, Chuanqi ;
Xue, Tao ;
Ma, Bingpeng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4057-4069
[3]  
Faghri F., 2018, P BRIT MACH VIS C BM
[4]  
Glorot X., 2010, P 13 INT C ART INT S, P249
[5]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[6]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8]   Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching [J].
Huang, Feiran ;
Zhang, Xiaoming ;
Zhao, Zhonghua ;
Li, Zhoujun .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) :2008-2020
[9]   Learning Semantic Concepts and Order for Image and Sentence Matching [J].
Huang, Yan ;
Wu, Qi ;
Song, Chunfeng ;
Wang, Liang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6163-6171
[10]  
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932