Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引：24

作者：

Xie, Xiumin ^{[1
]}

Li, Zhixin ^{[1
]}

Tang, Zhenjun ^{[1
]}

Yao, Dan ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;

D O I：

10.1016/j.ipm.2022.103154

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.

引用

页数：16

共 50 条

[1] News Image-Text Matching With News Knowledge Graph
Zhao Yumeng
Yun Jing
Gao Shuo
Liu Limin
IEEE ACCESS, 2021, 9 : 108017 - 108027
[2] Multi-scale motivated neural network for image-text matching
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
[3] Generative label fused network for image-text matching
Zhao, Guoshuai
Zhang, Chaofeng
Shang, Heng
Wang, Yaxiong
Zhu, Li
Qian, Xueming
KNOWLEDGE-BASED SYSTEMS, 2023, 263
[4] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[5] Location Attention Knowledge Embedding Model for Image-Text Matching
Xu, Guoqing
Hu, Min
Wang, Xiaohua
Yang, Jiaoyun
Li, Nan
Zhang, Qingyu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
[6] Learning Aligned Image-Text Representations Using Graph Attentive Relational Network
Jing, Ya
Wang, Wei
Wang, Liang
Tan, Tieniu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1840 - 1852
[7] Dual Semantic Relationship Attention Network for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[8] Reference-Aware Adaptive Network for Image-Text Matching
Xiong, Guoxin
Meng, Meng
Zhang, Tianzhu
Zhang, Dongming
Zhang, Yongdong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9678 - 9691
[9] Multi-level Symmetric Semantic Alignment Network for image-text matching
Wang, Wenzhuang
Di, Xiaoguang
Liu, Maozhen
Gao, Feng
NEUROCOMPUTING, 2024, 599
[10] Globally Guided Confidence Enhancement Network for Image-Text Matching
Dai, Xin
Tuerhong, Gulanbaier
Wushouer, Mairidan
APPLIED SCIENCES-BASEL, 2023, 13 (09):

← 1 2 3 4 5 →