Multi-hop Interactive Cross-Modal Retrieval

被引:0
|
作者
Ning, Xuecheng [1 ]
Yang, Xiaoshan [2 ,3 ,4 ]
Xu, Changsheng [1 ,2 ,3 ,4 ]
机构
[1] HeFei Univ Technol, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Deep learning; LSTMs;
D O I
10.1007/978-3-030-37734-2_55
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional representation learning based cross-modal retrieval approaches always represent the sentence with a global embedding feature, which easily neglects the local correlations between objects in the image and phrases in the sentence. In this paper, we present a novel Multi-hop Interactive Cross-modal Retrieval Model (MICRM), which interactively exploits the local correlations between images and words. We design a multi-hop interactive module to infer the high-order relevance between the image and the sentence. Experimental results on two benchmark datasets, MS-COCO and Flickr30K, demonstrate that our multi-hop interactive model performs significantly better than several competitive cross-modal retrieval methods.
引用
收藏
页码:681 / 693
页数:13
相关论文
共 50 条
  • [1] Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval
    Liu, Yishu
    Wu, Qingpeng
    Zhang, Zheng
    Zhang, Jingyi
    Lu, Guangming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 893 - 902
  • [2] Multi-Label Cross-modal Retrieval
    Ranjan, Viresh
    Rasiwasia, Nikhil
    Jawahar, C. V.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4094 - 4102
  • [3] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [4] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [5] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [6] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [7] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [8] Heterogeneous Interactive Learning Network for Unsupervised Cross-Modal Retrieval
    Zheng, Yuanchao
    Zhang, Xiaowei
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 692 - 707
  • [9] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [10] An Interactive Framework for Cross-modal Attribute-based Person Retrieval
    Specker, Andreas
    Schumann, Arne
    Beyerer, Juergen
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,