Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引:24
作者
Xie, Xiumin [1 ]
Li, Zhixin [1 ]
Tang, Zhenjun [1 ]
Yao, Dan [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;
D O I
10.1016/j.ipm.2022.103154
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Improving Image-Text Matching With Bidirectional Consistency of Cross-Modal Alignment
    Li, Zhe
    Zhang, Lei
    Zhang, Kun
    Zhang, Yongdong
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6590 - 6607
  • [42] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
    Wei, Kaimin
    Zhou, Zhibo
    IEEE ACCESS, 2020, 8 (08): : 96237 - 96248
  • [43] Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching
    Li, Zheng
    Guo, Caili
    Wang, Xin
    Feng, Zerun
    Du, Zhongtian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1921 - 1935
  • [44] Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency for Image-Text Matching
    Zeng, Pengpeng
    Gao, Lianli
    Lyu, Xinyu
    Jing, Shuaiqi
    Song, Jingkuan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2205 - 2213
  • [45] Adaptive Latent Graph Representation Learning for Image-Text Matching
    Tian, Mengxiao
    Wu, Xinxiao
    Jia, Yunde
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 (471-482) : 471 - 482
  • [46] An Image-Text Matching Method for Multi-Modal Robots
    Zheng, Ke
    Li, Zhou
    JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2024, 36 (01)
  • [47] Cross-modal multi-relationship aware reasoning for image-text matching
    Jin Zhang
    Xiaohai He
    Linbo Qing
    Luping Liu
    Xiaodong Luo
    Multimedia Tools and Applications, 2022, 81 : 12005 - 12027
  • [48] Cross-modal multi-relationship aware reasoning for image-text matching
    Zhang, Jin
    He, Xiaohai
    Qing, Linbo
    Liu, Luping
    Luo, Xiaodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12005 - 12027
  • [49] EXPLORING ENTITY-LEVEL SPATIAL RELATIONSHIPS FOR IMAGE-TEXT MATCHING
    Xia, Yaxian
    Huang, Lun
    Wang, Wenmin
    Wei, Xiao-Yong
    Chen, Jie
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4452 - 4456
  • [50] Multi-level semantics probability embedding for image-text matching
    Liu, An-An
    Yang, Long
    Li, Wenhui
    Nie, Weizhi
    Liu, Xianzhu
    Chen, Haipeng
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)