Multi-Modality Cross Attention Network for Image and Sentence Matching

被引:238
|
作者
Wei, Xi [1 ]
Zhang, Tianzhu [1 ]
Li, Yan [2 ]
Zhang, Yongdong [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR42600.2020.01095
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key of image and sentence matching is to accurately measure the visual-semantic similarity between an image and a sentence. However, most existing methods make use of only the intra-modality relationship within each modality or the inter-modality relationship between image regions and sentence words for the cross-modal matching task. Different from them, in this work, we propose a novel Multi-Modality Cross Attention (MMCA) Network for image and sentence matching by jointly modeling the intra-modality and inter-modality relationships of image regions and sentence words in a unified deep model. In the proposed MMCA, we design a novel cross-attention mechanism, which is able to exploit not only the intra-modality relationship within each modality, but also the inter-modality relationship between image regions and sentence words to complement and enhance each other for image and sentence matching. Extensive experimental results on two standard benchmarks including Flickr30K and MS-COCO demonstrate that the proposed model performs favorably against state-of-the-art image and sentence matching methods.
引用
收藏
页码:10938 / 10947
页数:10
相关论文
共 50 条
  • [21] SynergyX: a multi-modality mutual attention network for interpretable drug synergy prediction
    Guo, Yue
    Hu, Haitao
    Chen, Wenbo
    Yin, Hao
    Wu, Jian
    Hsieh, Chang-Yu
    He, Qiaojun
    Cao, Ji
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (02)
  • [22] A MULTI-MODALITY FUSION NETWORK BASED ON ATTENTION MECHANISM FOR BRAIN TUMOR SEGMENTATION
    Zhou, Tongxue
    Ruan, Su
    Guo, Yu
    Canu, Stephane
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 377 - 380
  • [23] Automatic Landmark Based Multi-Modality Medical Image Registration Using Block Matching
    Wang, Yuanjun
    Zha, Shanshan
    Liu, Yu
    Nie, Shengdong
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2015, 5 (08) : 1848 - 1852
  • [24] CMR-net: A cross modality reconstruction network for multi-modality remote sensing classification
    Wang, Huiqing
    Wang, Huajun
    Wu, Lingfeng
    PLOS ONE, 2024, 19 (06):
  • [25] Co-segmentation of Multi-modality Spinal Image Using Channel and Spatial Attention
    Zou, Yaocong
    Shi, Yonghong
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 287 - 295
  • [26] Unbiased Multi-modality Guidance for Image Inpainting
    Yu, Yongsheng
    Du, Dawei
    Zhang, Libo
    Luo, Tiejian
    COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 668 - 684
  • [27] A new multi-modality image registration algorithm
    Samant, S
    Parra, N
    Davis, B
    Sontag, M
    Narasimhan, G
    MEDICAL PHYSICS, 2002, 29 (06) : 1244 - 1244
  • [28] The Research of Multi-Modality Parkinson's Disease Image Based on Cross-Layer Convolutional Neural Network
    Dai, Yin
    Tao, Zuitian
    Wang, Yang
    Zhao, Yiqi
    Hou, Jiaxin
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2019, 9 (07) : 1440 - 1447
  • [29] CRCNet: Global-local context and multi-modality cross attention for polyp segmentation
    Zhu, Jianbo
    Ge, Mingfeng
    Chang, Zhimin
    Dong, Wenfei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 83
  • [30] Multi-modality Sensor Data Classification with Selective Attention
    Zhang, Xiang
    Yao, Lina
    Huang, Chaoran
    Wang, Sen
    Tan, Mingkui
    Long, Guodong
    Wang, Can
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3111 - 3117