Graph structure prefix injection transformer for multi-modal entity alignment

被引：0

作者：

Zhang, Yan ^{[1
,2
,3
,4
,5
]}

Luo, Xiangyu ^{[2
]}

Hu, Jing ^{[2
]}

Zhang, Miao ^{[1
,3
,4
]}

Xiao, Kui ^{[1
,3
,4
]}

Li, Zhifei ^{[1
,2
,3
,4
,5
]}

机构：

[1] Hubei Univ, Sch Comp Sci, Wuhan 430062, Peoples R China

[2] Hubei Univ, Sch Cyber Sci & Technol, Wuhan 430062, Peoples R China

[3] Hubei Univ, Hubei Key Lab Big Data Intelligent Anal & Applicat, Wuhan 430062, Peoples R China

[4] Hubei Univ, Key Lab Intelligent Sensing Syst & Secur, Minist Educ, Wuhan 430062, Peoples R China

[5] Hubei Univ, Hubei Prov Engn Res Ctr Intelligent Connected Vehi, Wuhan 430062, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2025年 / 62卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Multi-modal knowledge graphs; Multi-modal entity alignment; Contrastive learning; KNOWLEDGE GRAPHS;

D O I：

10.1016/j.ipm.2024.104048

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-modal entity alignment aims to integrate corresponding entities across different MMKGs. However, previous studies have not adequately considered the impact of graph structural heterogeneity on EA tasks. Different MMKGs typically exhibit variations in graph structural features, leading to distinct structural representations of the same entity relationships. Additionally, the topological structure of the graph also differs. To tackle these challenges, we introduce GSIEA, the MMEA framework that integrates structural prefix injection and modality fusion. Different from other methods that directly fuse structural data with multi-modal features to perform the alignment task, GSIEA separately processes structural data and multi-modal data such as images and attributes, incorporating a prefix injection interaction module within a multi-head attention mechanism to optimize the utilization of multi-modal information and minimize the impact of graph structural differences. Additionally, GSIEA employs a convolutional enhancement module to extract fine-grained multi-modal features and computes cross-modal weights to achieve feature fusion. We conduct experimental evaluations on two public datasets, containing 12,846 and 11,199 entity pairs, respectively, demonstrating that GSIEA outperforms baseline models, with an average improvement of 3.26% in MRR and a maximum gain of 12.5%. Furthermore, the average improvement in Hits@1 is 4.96%, with a maximum increase of 16.92%. The code of our model is stored at https://github.com/HubuKG/GSIEA.

引用

页数：17

共 50 条

[41] Multi-modal transformer for fake news detection
Yang, Pingping
Ma, Jiachen
Liu, Yong
Liu, Meng
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14699 - 14717
[42] Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion
Wang, Yunpeng
Ning, Bo
Wang, Xin
Li, Guanyu
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (05):
[43] Multi-modal Alignment using Representation Codebook
Duan, Jiali
Chen, Liqun
Tran, Son
Yang, Jinyu
Xu, Yi
Zeng, Belinda
Chilimbi, Trishul
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15630 - 15639
[44] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
Zhou, You
Yang, Gang
Zhou, Yang
Ding, Dayong
Zhao, Jianchun
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
[45] Multi-Modal Decentralized Interaction in Multi-Entity Systems
Olaru, Andrei
Pricope, Monica
SENSORS, 2023, 23 (06)
[46] Cross-Modal Graph Attention Network for Entity Alignment
Xu, Baogui
Xu, Chengjin
Su, Bing
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3715 - 3723
[47] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
Zhang, Duzhen
Chen, Feilong
Chang, Jianlong
Chen, Xiuyi
Tian, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
[48] Visual Entity Linking via Multi-modal Learning
Zheng, Qiushuo
Wen, Hao
Wang, Meng
Qi, Guilin
DATA INTELLIGENCE, 2022, 4 (01) : 1 - 19
[49] Richpedia: A Comprehensive Multi-modal Knowledge Graph
Wang, Meng
Qi, Guilin
Wang, Haofen
Zheng, Qiushuo
SEMANTIC TECHNOLOGY, JIST 2019: PROCEEDINGS, 2020, 12032 : 130 - 145
[50] What Is a Multi-Modal Knowledge Graph: A Survey
Peng, Jinghui
Hu, Xinyu
Huang, Wenbo
Yang, Jian
BIG DATA RESEARCH, 2023, 32

← 1 2 3 4 5 →