Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引：24

作者：

Xie, Xiumin ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Yao, Dan ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;

D O I：

10.1016/j.ipm.2022.103154

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.

引用

页数：16

共 50 条

[21] CycleMatch: A cycle-consistent embedding network for image-text matching
Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S.
PATTERN RECOGNITION, 2019, 93 : 365 - 379
[22] Image-text interaction graph neural network for image-text sentiment analysis
Liao, Wenxiong
Zeng, Bi
Liu, Jianqi
Wei, Pengfei
Fang, Jiongkun
APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
[23] Asymmetric Polysemous Reasoning for Image-Text Matching
Zhang, Hongping
Yang, Ming
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
[24] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
Miao Lanxin
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[25] Fusion layer attention for image-text matching
Wang, Depeng
Wang, Liejun
Song, Shiji
Huang, Gao
Guo, Yuchen
Cheng, Shuli
Ao, Naixiang
Du, Anyu
NEUROCOMPUTING, 2021, 442 : 249 - 259
[26] Stacked Cross Attention for Image-Text Matching
Lee, Kuang-Huei
Chen, Xi
Hua, Gang
Hu, Houdong
He, Xiaodong
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
[27] Enhanced Semantic Similarity Learning Framework for Image-Text Matching
Zhang, Kun
Hu, Bo
Zhang, Huatian
Li, Zhe
Mao, Zhendong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2973 - 2988
[28] Giving Text More Imagination Space for Image-text Matching
Dong, Xinfeng
Han, Longfei
Zhang, Dingwen
Liu, Li
Han, Junwei
Zhang, Huaxiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368
[29] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
Ji, Zhong
Lin, Zhigang
Wang, Haoran
He, Yuqing
IEEE ACCESS, 2020, 8 : 38438 - 38447
[30] ATTEND, CORRECT AND FOCUS: A BIDIRECTIONAL CORRECT ATTENTION NETWORK FOR IMAGE-TEXT MATCHING
Liu, Yang
Wang, Huaqiu
Meng, Fanyang
Liu, Mengyuan
Liu, Hong
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2673 - 2677

← 1 2 3 4 5 →