Graph structure prefix injection transformer for multi-modal entity alignment

被引:0
|
作者
Zhang, Yan [1 ,2 ,3 ,4 ,5 ]
Luo, Xiangyu [2 ]
Hu, Jing [2 ]
Zhang, Miao [1 ,3 ,4 ]
Xiao, Kui [1 ,3 ,4 ]
Li, Zhifei [1 ,2 ,3 ,4 ,5 ]
机构
[1] Hubei Univ, Sch Comp Sci, Wuhan 430062, Peoples R China
[2] Hubei Univ, Sch Cyber Sci & Technol, Wuhan 430062, Peoples R China
[3] Hubei Univ, Hubei Key Lab Big Data Intelligent Anal & Applicat, Wuhan 430062, Peoples R China
[4] Hubei Univ, Key Lab Intelligent Sensing Syst & Secur, Minist Educ, Wuhan 430062, Peoples R China
[5] Hubei Univ, Hubei Prov Engn Res Ctr Intelligent Connected Vehi, Wuhan 430062, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal knowledge graphs; Multi-modal entity alignment; Contrastive learning; KNOWLEDGE GRAPHS;
D O I
10.1016/j.ipm.2024.104048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal entity alignment aims to integrate corresponding entities across different MMKGs. However, previous studies have not adequately considered the impact of graph structural heterogeneity on EA tasks. Different MMKGs typically exhibit variations in graph structural features, leading to distinct structural representations of the same entity relationships. Additionally, the topological structure of the graph also differs. To tackle these challenges, we introduce GSIEA, the MMEA framework that integrates structural prefix injection and modality fusion. Different from other methods that directly fuse structural data with multi-modal features to perform the alignment task, GSIEA separately processes structural data and multi-modal data such as images and attributes, incorporating a prefix injection interaction module within a multi-head attention mechanism to optimize the utilization of multi-modal information and minimize the impact of graph structural differences. Additionally, GSIEA employs a convolutional enhancement module to extract fine-grained multi-modal features and computes cross-modal weights to achieve feature fusion. We conduct experimental evaluations on two public datasets, containing 12,846 and 11,199 entity pairs, respectively, demonstrating that GSIEA outperforms baseline models, with an average improvement of 3.26% in MRR and a maximum gain of 12.5%. Furthermore, the average improvement in Hits@1 is 4.96%, with a maximum increase of 16.92%. The code of our model is stored at https://github.com/HubuKG/GSIEA.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Multi-modal entity alignment based on joint knowledge representation learning
    Wang H.-Y.
    Lun B.
    Zhang X.-M.
    Sun X.-L.
    Kongzhi yu Juece/Control and Decision, 2021, 35 (12): : 2855 - 2864
  • [22] Multi-Modal Entity Alignment Using Uncertainty Quantification for Modality Importance
    Hama, Kenta
    Matsubara, Takashi
    IEEE ACCESS, 2023, 11 : 28479 - 28489
  • [23] MFIEA: entity alignment through multi-modal feature interaction and knowledge facts
    Zhang, Xiaoming
    Lv, Menglong
    Wang, Huiyong
    Naseriparsa, Mehdi
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2025,
  • [24] Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment
    Chen, Zhuo
    Guo, Lingbing
    Fang, Yin
    Zhang, Yichi
    Chen, Jiaoyan
    Pan, Jeff Z.
    Li, Yangning
    Chen, Huajun
    Zhang, Wen
    SEMANTIC WEB, ISWC 2023, PART I, 2023, 14265 : 121 - 139
  • [25] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Bo Liu
    Ruoyi Song
    Yuejia Xiang
    Junbo Du
    Weijian Ruan
    Jinhui Hu
    IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (11) : 2031 - 2033
  • [26] Self-Supervised Entity Alignment Based on Multi-Modal Contrastive Learning
    Liu, Bo
    Song, Ruoyi
    Xiang, Yuejia
    Du, Junbo
    Ruan, Weijian
    Hu, Jinhui
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (11) : 2031 - 2033
  • [27] Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
    Zhang, Dong
    Wei, Suzhong
    Li, Shoushan
    Wu, Hanqian
    Zhu, Qiaoming
    Zhou, Guodong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14347 - 14355
  • [28] Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph
    Zhao, Wentian
    Wu, Xinxiao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2659 - 2670
  • [29] Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation
    Tang, Wei
    Wang, Yuanyi
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 366 - 375
  • [30] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390