Multi-Modality Cross Attention Network for Image and Sentence Matching

被引:238
|
作者
Wei, Xi [1 ]
Zhang, Tianzhu [1 ]
Li, Yan [2 ]
Zhang, Yongdong [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR42600.2020.01095
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key of image and sentence matching is to accurately measure the visual-semantic similarity between an image and a sentence. However, most existing methods make use of only the intra-modality relationship within each modality or the inter-modality relationship between image regions and sentence words for the cross-modal matching task. Different from them, in this work, we propose a novel Multi-Modality Cross Attention (MMCA) Network for image and sentence matching by jointly modeling the intra-modality and inter-modality relationships of image regions and sentence words in a unified deep model. In the proposed MMCA, we design a novel cross-attention mechanism, which is able to exploit not only the intra-modality relationship within each modality, but also the inter-modality relationship between image regions and sentence words to complement and enhance each other for image and sentence matching. Extensive experimental results on two standard benchmarks including Flickr30K and MS-COCO demonstrate that the proposed model performs favorably against state-of-the-art image and sentence matching methods.
引用
收藏
页码:10938 / 10947
页数:10
相关论文
共 50 条
  • [1] STAFuse: A Feature Decomposition Network with Super Token Attention for Multi-modality Image Fusion
    Chen, Peng
    Chen, Aiguo
    Wang, Chuang
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14880 : 324 - 335
  • [2] Multi-Modality Reconstruction Attention and Difference Enhancement Network for Brain MRI Image Segmentation
    Zhang, Xiangfen
    Liu, Yan
    Zhang, Qingyi
    Yuan, Feiniu
    IEEE ACCESS, 2022, 10 : 31058 - 31069
  • [3] Multi-modality frequency-aware cross attention network for fake news detection
    Cui, Wei
    Zhang, Xuerui
    Shang, Mingsheng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 433 - 455
  • [4] A framework for multi-modality image matching in computer assisted neuronavigation
    Maier, MW
    Erbe, H
    Kriete, A
    CAR '96: COMPUTER ASSISTED RADIOLOGY, 1996, 1124 : 994 - 994
  • [5] Multi-modality relation attention network for breast tumor classification
    Yang, Xiao
    Xi, Xiaoming
    Yang, Lu
    Xu, Chuanzhen
    Song, Zuoyong
    Nie, Xiushan
    Qiao, Lishan
    Li, Chenglong
    Shi, Qinglei
    Yin, Yilong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
  • [6] MCAPR: Multi-modality Cross Attention for Camera Absolute Pose Regression
    Shu, Qiqi
    Luan, Zhaoliang
    Poslad, Stefan
    Bourguet, Marie-Luce
    Xu, Meng
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 : 434 - 445
  • [7] Multi-modal Sentence Summarization with Modality Attention and Image Filtering
    Li, Haoran
    Zhu, Junnan
    Liu, Tianshang
    Zhang, Jiajun
    Zong, Chengqing
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4152 - 4158
  • [8] Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching
    Shi, Zhangxiang
    Zhang, Tianzhu
    Wei, Xi
    Wu, Feng
    Zhang, Yongdong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1326 - 1337
  • [9] An Encoder Generative Adversarial Network for Multi-modality Image Recognition
    Chen, Yu
    Yang, Chunling
    Zhu, Min
    Yang, ShiYan
    IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 2689 - 2694
  • [10] Multi-Modality Deep Network for Extreme Learned Image Compression
    Jiang, Xuhao
    Tan, Weimin
    Tan, Tian
    Yan, Bo
    Shen, Liquan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1033 - 1041