Multi-scale motivated neural network for image-text matching

被引:1
作者
Qin, Xueyang [1 ]
Li, Lishuang [1 ]
Pang, Guangyao [2 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Wuzhou Univ, Sch Data Sci & Software Engn, Wuzhou 543002, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text matching; Multi-scale information; Cross-modal interaction; Matching score fusion algorithm; ATTENTION;
D O I
10.1007/s11042-023-15321-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing mainstream image-text matching methods usually measure the relevance of image-text pairs by capturing and aggregating the affinities between textual words and visual regions, while failing to consider the single-scale matching bias caused by the imbalance of image and text information. In this paper, we design a Multi-Scale Motivated Neural Network (MSMNN) model for image-text matching. In contrast to previous single-scale methods, MSMNN encourages neural networks to extract visual and textual features from three scales, including local features, global features and salient features, which can take full advantage of the complementarity of multi-scale matching to reduce the bias of single-scale matching. Also, we propose a cross-modal interaction module to realize the fusion of visual and textual features in local alignment, so as to discover the potential relationship between image-text pairs. Furthermore, we also propose a matching score fusion algorithm to fuse matching results from three different levels, which can be freely applied to other initial image-text matching results with a negligible overhead. Extensive experiments validate the effectiveness of our method, and the performance has achieved fairly competitive results on two well-known datasets, Flickr30K and MSCOCO, with a boost of 1.04% and 0.59% on evaluation metric mR compared with the advanced method.
引用
收藏
页码:4383 / 4407
页数:25
相关论文
共 50 条
  • [1] Multi-scale motivated neural network for image-text matching
    Xueyang Qin
    Lishuang Li
    Guangyao Pang
    Multimedia Tools and Applications, 2024, 83 : 4383 - 4407
  • [2] Multi-level Symmetric Semantic Alignment Network for image-text matching
    Wang, Wenzhuang
    Di, Xiaoguang
    Liu, Maozhen
    Gao, Feng
    NEUROCOMPUTING, 2024, 599
  • [3] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    He, Yuqing
    IEEE ACCESS, 2020, 8 : 38438 - 38447
  • [4] Generative label fused network for image-text matching
    Zhao, Guoshuai
    Zhang, Chaofeng
    Shang, Heng
    Wang, Yaxiong
    Zhu, Li
    Qian, Xueming
    KNOWLEDGE-BASED SYSTEMS, 2023, 263
  • [5] Cross-modal Semantically Augmented Network for Image-text Matching
    Yao, Tao
    Li, Yiru
    Li, Ying
    Zhu, Yingying
    Wang, Gang
    Yue, Jun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [6] Context-Aware Multi-View Summarization Network for Image-Text Matching
    Qu, Leigang
    Liu, Meng
    Cao, Da
    Nie, Liqiang
    Tian, Qi
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055
  • [7] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] Reference-Aware Adaptive Network for Image-Text Matching
    Xiong, Guoxin
    Meng, Meng
    Zhang, Tianzhu
    Zhang, Dongming
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9678 - 9691
  • [9] Globally Guided Confidence Enhancement Network for Image-Text Matching
    Dai, Xin
    Tuerhong, Gulanbaier
    Wushouer, Mairidan
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [10] A Mutually Textual and Visual Refinement Network for Image-Text Matching
    Pang, Shanmin
    Zeng, Yueyang
    Zhao, Jiawei
    Xue, Jianru
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7555 - 7566