Multi-scale motivated neural network for image-text matching

被引:1
作者
Qin, Xueyang [1 ]
Li, Lishuang [1 ]
Pang, Guangyao [2 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Wuzhou Univ, Sch Data Sci & Software Engn, Wuzhou 543002, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text matching; Multi-scale information; Cross-modal interaction; Matching score fusion algorithm; ATTENTION;
D O I
10.1007/s11042-023-15321-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing mainstream image-text matching methods usually measure the relevance of image-text pairs by capturing and aggregating the affinities between textual words and visual regions, while failing to consider the single-scale matching bias caused by the imbalance of image and text information. In this paper, we design a Multi-Scale Motivated Neural Network (MSMNN) model for image-text matching. In contrast to previous single-scale methods, MSMNN encourages neural networks to extract visual and textual features from three scales, including local features, global features and salient features, which can take full advantage of the complementarity of multi-scale matching to reduce the bias of single-scale matching. Also, we propose a cross-modal interaction module to realize the fusion of visual and textual features in local alignment, so as to discover the potential relationship between image-text pairs. Furthermore, we also propose a matching score fusion algorithm to fuse matching results from three different levels, which can be freely applied to other initial image-text matching results with a negligible overhead. Extensive experiments validate the effectiveness of our method, and the performance has achieved fairly competitive results on two well-known datasets, Flickr30K and MSCOCO, with a boost of 1.04% and 0.59% on evaluation metric mR compared with the advanced method.
引用
收藏
页码:4383 / 4407
页数:25
相关论文
共 50 条
[21]   Local Alignment with Global Semantic Consistence Network for Image-Text Matching [J].
Li, Pengwei ;
Wu, Shihua ;
Lian, Zhichao .
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, :652-657
[22]   CycleMatch: A cycle-consistent embedding network for image-text matching [J].
Liu, Yu ;
Guo, Yanming ;
Liu, Li ;
Bakker, Erwin M. ;
Lew, Michael S. .
PATTERN RECOGNITION, 2019, 93 :365-379
[23]   Multi-Scale Neural Network With Dilated Convolutions for Image Deblurring [J].
Ople, Jose Jaena Mari ;
Yeh, Pin-Yi ;
Sun, Shih-Wei ;
Tsai, I-Te ;
Hua, Kai-Lung .
IEEE ACCESS, 2020, 8 :53942-53952
[24]   Multi-level network based on transformer encoder for fine-grained image-text matching [J].
Yang, Lei ;
Feng, Yong ;
Zhou, Mingliang ;
Xiong, Xiancai ;
Wang, Yongheng ;
Qiang, Baohua .
MULTIMEDIA SYSTEMS, 2023, 29 (04) :1981-1994
[25]   IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS [J].
Miao Lanxin .
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[26]   Fusion layer attention for image-text matching [J].
Wang, Depeng ;
Wang, Liejun ;
Song, Shiji ;
Huang, Gao ;
Guo, Yuchen ;
Cheng, Shuli ;
Ao, Naixiang ;
Du, Anyu .
NEUROCOMPUTING, 2021, 442 :249-259
[27]   Asymmetric Polysemous Reasoning for Image-Text Matching [J].
Zhang, Hongping ;
Yang, Ming .
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, :1013-1022
[28]   Stacked Cross Attention for Image-Text Matching [J].
Lee, Kuang-Huei ;
Chen, Xi ;
Hua, Gang ;
Hu, Houdong ;
He, Xiaodong .
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 :212-228
[29]   Enhanced Semantic Similarity Learning Framework for Image-Text Matching [J].
Zhang, Kun ;
Hu, Bo ;
Zhang, Huatian ;
Li, Zhe ;
Mao, Zhendong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) :2973-2988
[30]   Multi-view inter-modality representation with progressive fusion for image-text matching [J].
Wu, Jie ;
Wang, Leiquan ;
Chen, Chenglizhao ;
Lu, Jing ;
Wu, Chunlei .
NEUROCOMPUTING, 2023, 535 :1-12