Integrating listwise ranking into pairwise-based image-text retrieval

被引:4
作者
Li, Zheng [1 ]
Guo, Caili [1 ]
Wang, Xin [1 ]
Zhang, Hao [1 ]
Wang, Yanjun [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] China Telecom Digital Intelligence Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text retrieval; Pairwise approach; Listwise approach; Relevance ranking;
D O I
10.1016/j.knosys.2024.111431
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image -Text Retrieval (ITR) is essentially a ranking problem. Given a query caption, the goal is to rank candidate images by relevance, from large to small. The current ITR datasets are constructed in a pairwise manner. Imagetext pairs are annotated as positive or negative. Correspondingly, ITR models mainly use pairwise losses, such as triplet loss, to learn to rank. Pairwise-based ITR increases positive pair similarity while decreasing negative pair similarity indiscriminately. However, the relevance between dissimilar negative pairs is different. Pairwise annotations cannot reflect this difference in relevance. In the current datasets, pairwise annotations miss many correlations. There are many potential positive pairs among the pairs labeled as negative. Pairwisebased ITR can only rank positive samples before negative samples, but cannot rank negative samples by relevance. In this paper, we integrate listwise ranking into conventional pairwise-based ITR. Listwise ranking optimizes the entire ranking list based on relevance scores. Specifically, we first propose a Relevance Score Calculation (RSC) module to calculate the relevance score of the entire ranked list. Then we choose the ranking metric, Normalised Discounted Cumulative Gain (NDCG), as the optimization objective. We apply a metric smoothing method named Smooth-NDCG (S-NDCG) to ITR, which transforms the non -differentiable NDCG into a differentiable listwise loss. Our listwise ranking approach can be plug -and -play integrated into current pairwise-based ITR models. Experiments on ITR benchmarks show that integrating listwise ranking can improve the performance of current ITR models and provide more user-friendly retrieval results. The code is available at https://github.com/AAA-Zheng/Listwise_ITR.
引用
收藏
页数:13
相关论文
共 52 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
Ba J, 2014, ACS SYM SER
[3]   Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval [J].
Brown, Andrew ;
Xie, Weidi ;
Kalogeiton, Vicky ;
Zisserman, Andrew .
COMPUTER VISION - ECCV 2020, PT IX, 2020, 12354 :677-694
[4]   Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks [J].
Bruch, Sebastian ;
Zoghi, Masrour ;
Bendersky, Michael ;
Najork, Marc .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :1241-1244
[5]   Deep Metric Learning to Rank [J].
Cakir, Fatih ;
He, Kun ;
Xia, Xide ;
Kulis, Brian ;
Sclaroff, Stan .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1861-1870
[6]  
Chakrabarti S., 2008, P 14 ACM SIGKDD C KN, P88, DOI [10.1145/140189, 0.1401906, DOI 10.1145/140189]
[7]   IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J].
Chen, Hui ;
Ding, Guiguang ;
Liu, Xudong ;
Lin, Zijia ;
Liu, Ji ;
Han, Jungong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12652-12660
[8]   Learning the Best Pooling Strategy for Visual Semantic Embedding [J].
Chen, Jiacheng ;
Hu, Hexiang ;
Wu, Hao ;
Jiang, Yuning ;
Wang, Changhu .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15784-15793
[9]   Adaptive Offline Quintuplet Loss for Image-Text Matching [J].
Chen, Tianlang ;
Deng, Jiajun ;
Luo, Jiebo .
COMPUTER VISION - ECCV 2020, PT XIII, 2020, 12358 :549-565
[10]   UNITER: UNiversal Image-TExt Representation Learning [J].
Chen, Yen-Chun ;
Li, Linjie ;
Yu, Licheng ;
El Kholy, Ahmed ;
Ahmed, Faisal ;
Gan, Zhe ;
Cheng, Yu ;
Liu, Jingjing .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120