Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引：24

作者：

Xie, Xiumin ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Yao, Dan ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;

D O I：

10.1016/j.ipm.2022.103154

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.

引用

页数：16

共 50 条

[41] Improving Image-Text Matching With Bidirectional Consistency of Cross-Modal Alignment
Li, Zhe
Zhang, Lei
Zhang, Kun
Zhang, Yongdong
Mao, Zhendong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6590 - 6607
[42] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
Wei, Kaimin
Zhou, Zhibo
IEEE ACCESS, 2020, 8 (08): : 96237 - 96248
[43] Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching
Li, Zheng
Guo, Caili
Wang, Xin
Feng, Zerun
Du, Zhongtian
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1921 - 1935
[44] Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency for Image-Text Matching
Zeng, Pengpeng
Gao, Lianli
Lyu, Xinyu
Jing, Shuaiqi
Song, Jingkuan
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2205 - 2213
[45] Adaptive Latent Graph Representation Learning for Image-Text Matching
Tian, Mengxiao
Wu, Xinxiao
Jia, Yunde
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 (471-482) : 471 - 482
[46] An Image-Text Matching Method for Multi-Modal Robots
Zheng, Ke
Li, Zhou
JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2024, 36 (01)
[47] Cross-modal multi-relationship aware reasoning for image-text matching
Jin Zhang
Xiaohai He
Linbo Qing
Luping Liu
Xiaodong Luo
Multimedia Tools and Applications, 2022, 81 : 12005 - 12027
[48] Cross-modal multi-relationship aware reasoning for image-text matching
Zhang, Jin
He, Xiaohai
Qing, Linbo
Liu, Luping
Luo, Xiaodong
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12005 - 12027
[49] EXPLORING ENTITY-LEVEL SPATIAL RELATIONSHIPS FOR IMAGE-TEXT MATCHING
Xia, Yaxian
Huang, Lun
Wang, Wenmin
Wei, Xiao-Yong
Chen, Jie
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4452 - 4456
[50] Multi-level semantics probability embedding for image-text matching
Liu, An-An
Yang, Long
Li, Wenhui
Nie, Weizhi
Liu, Xianzhu
Chen, Haipeng
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)

← 1 2 3 4 5 →