Graph structure prefix injection transformer for multi-modal entity alignment

被引:0
|
作者
Zhang, Yan [1 ,2 ,3 ,4 ,5 ]
Luo, Xiangyu [2 ]
Hu, Jing [2 ]
Zhang, Miao [1 ,3 ,4 ]
Xiao, Kui [1 ,3 ,4 ]
Li, Zhifei [1 ,2 ,3 ,4 ,5 ]
机构
[1] Hubei Univ, Sch Comp Sci, Wuhan 430062, Peoples R China
[2] Hubei Univ, Sch Cyber Sci & Technol, Wuhan 430062, Peoples R China
[3] Hubei Univ, Hubei Key Lab Big Data Intelligent Anal & Applicat, Wuhan 430062, Peoples R China
[4] Hubei Univ, Key Lab Intelligent Sensing Syst & Secur, Minist Educ, Wuhan 430062, Peoples R China
[5] Hubei Univ, Hubei Prov Engn Res Ctr Intelligent Connected Vehi, Wuhan 430062, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal knowledge graphs; Multi-modal entity alignment; Contrastive learning; KNOWLEDGE GRAPHS;
D O I
10.1016/j.ipm.2024.104048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal entity alignment aims to integrate corresponding entities across different MMKGs. However, previous studies have not adequately considered the impact of graph structural heterogeneity on EA tasks. Different MMKGs typically exhibit variations in graph structural features, leading to distinct structural representations of the same entity relationships. Additionally, the topological structure of the graph also differs. To tackle these challenges, we introduce GSIEA, the MMEA framework that integrates structural prefix injection and modality fusion. Different from other methods that directly fuse structural data with multi-modal features to perform the alignment task, GSIEA separately processes structural data and multi-modal data such as images and attributes, incorporating a prefix injection interaction module within a multi-head attention mechanism to optimize the utilization of multi-modal information and minimize the impact of graph structural differences. Additionally, GSIEA employs a convolutional enhancement module to extract fine-grained multi-modal features and computes cross-modal weights to achieve feature fusion. We conduct experimental evaluations on two public datasets, containing 12,846 and 11,199 entity pairs, respectively, demonstrating that GSIEA outperforms baseline models, with an average improvement of 3.26% in MRR and a maximum gain of 12.5%. Furthermore, the average improvement in Hits@1 is 4.96%, with a maximum increase of 16.92%. The code of our model is stored at https://github.com/HubuKG/GSIEA.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Multi-modal transformer for fake news detection
    Yang, Pingping
    Ma, Jiachen
    Liu, Yong
    Liu, Meng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14699 - 14717
  • [42] Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion
    Wang, Yunpeng
    Ning, Bo
    Wang, Xin
    Li, Guanyu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
  • [43] Multi-modal Alignment using Representation Codebook
    Duan, Jiali
    Chen, Liqun
    Tran, Son
    Yang, Jinyu
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15630 - 15639
  • [44] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
    Zhou, You
    Yang, Gang
    Zhou, Yang
    Ding, Dayong
    Zhao, Jianchun
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
  • [45] Multi-Modal Decentralized Interaction in Multi-Entity Systems
    Olaru, Andrei
    Pricope, Monica
    SENSORS, 2023, 23 (06)
  • [46] Cross-Modal Graph Attention Network for Entity Alignment
    Xu, Baogui
    Xu, Chengjin
    Su, Bing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3715 - 3723
  • [47] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
    Zhang, Duzhen
    Chen, Feilong
    Chang, Jianlong
    Chen, Xiuyi
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
  • [48] Visual Entity Linking via Multi-modal Learning
    Zheng, Qiushuo
    Wen, Hao
    Wang, Meng
    Qi, Guilin
    DATA INTELLIGENCE, 2022, 4 (01) : 1 - 19
  • [49] Richpedia: A Comprehensive Multi-modal Knowledge Graph
    Wang, Meng
    Qi, Guilin
    Wang, Haofen
    Zheng, Qiushuo
    SEMANTIC TECHNOLOGY, JIST 2019: PROCEEDINGS, 2020, 12032 : 130 - 145
  • [50] What Is a Multi-Modal Knowledge Graph: A Survey
    Peng, Jinghui
    Hu, Xinyu
    Huang, Wenbo
    Yang, Jian
    BIG DATA RESEARCH, 2023, 32