Community knowledge graph abstraction for enhanced link prediction: A study on PubMed knowledge graph

被引:1
作者
Zhao, Yang [1 ]
Bollegala, Danushka [2 ]
Hirose, Shunsuke [1 ]
Jin, Yingzi [1 ]
Kozu, Tomotake [1 ]
机构
[1] Deloitte Touche Tohmatsu LLC, Deloitte Analyt R&D, 3-2-3 Marunouchi,Chiyoda Ku, Tokyo 1008360, Japan
[2] Univ Liverpool, Dept Comp Sci, Liverpool L69 3BX, England
关键词
PKG; CKG; KGE; Entity distance-based method; Link prediction; Backtracking process;
D O I
10.1016/j.jbi.2024.104725
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: As new knowledge is produced at a rapid pace in the biomedical field, existing biomedical Knowledge Graphs (KGs) cannot be manually updated in a timely manner. Previous work in Natural Language Processing (NLP) has leveraged link prediction to infer the missing knowledge in general-purpose KGs. Inspired by this, we propose to apply link prediction to existing biomedical KGs to infer missing knowledge. Although Knowledge Graph Embedding (KGE) methods are effective in link prediction tasks, they are less capable of capturing relations between communities of entities with specific attributes (Fanourakis et al., 2023). Methods: To address this challenge, we proposed an entity distance-based method for abstracting a Community Knowledge Graph (CKG) from a simplified version of the pre-existing PubMed Knowledge Graph (PKG) (Xu et al., 2020). For link prediction on the abstracted CKG, we proposed an extension approach for the existing KGE models by linking the information in the PKG to the abstracted CKG. The applicability of this extension was proved by employing six well-known KGE models: TransE, TransH, DistMult, ComplEx, SimplE, and RotatE. Evaluation metrics including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@k were used to assess the link prediction performance. In addition, we presented a backtracking process that traces the results of CKG link prediction back to the PKG scale for further comparison. Results: Six different CKGs were abstracted from the PKG by using embeddings of the six KGE methods. The results of link prediction in these abstracted CKGs indicate that our proposed extension can improve the existing KGE methods, achieving a top-10 accuracy of 0.69 compared to 0.5 for TransE, 0.7 compared to 0.54 for TransH, 0.67 compared to 0.6 for DistMult, 0.73 compared to 0.57 for ComplEx, 0.73 compared to 0.63 for SimplE, and 0.85 compared to 0.76 for RotatE on their CKGs, respectively. These improved performances also highlight the wide applicability of the extension approach. Conclusion: This study proposed novel insights into abstracting CKGs from the PKG. The extension approach indicated enhanced performance of the existing KGE methods and has applicability. As an interesting future extension, we plan to conduct link prediction for entities that are newly introduced to the PKG.
引用
收藏
页数:11
相关论文
共 49 条
[1]   Comparing methods for drug-gene interaction prediction on the biomedical literature knowledge graph: performance versus explainability [J].
Aisopos, Fotis ;
Paliouras, Georgios .
BMC BIOINFORMATICS, 2023, 24 (01)
[2]   Knowledge Graph-Based Framework for Decision Making Process with Limited Interaction [J].
Albagli-Kim, Sivan ;
Beimel, Dizza .
MATHEMATICS, 2022, 10 (21)
[3]  
Bordes A., 2013, Advances in Neural Information Processing Systems, V26
[4]   Knowledge-Based Biomedical Data Science [J].
Callahan, Tiffany J. ;
Tripodi, Ignacio J. ;
Pielke-Lombardo, Harrison ;
Hunter, Lawrence E. .
ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 3, 2020, 2020, 3 :23-41
[5]  
COHEN PR, 1988, AI MAG, V9, P35
[6]   Forty years of SNOMED: a literature review [J].
Cornet, Ronald ;
de Keizer, Nicolette .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (Suppl 1)
[7]   I-DIVERGENCE GEOMETRY OF PROBABILITY DISTRIBUTIONS AND MINIMIZATION PROBLEMS [J].
CSISZAR, I .
ANNALS OF PROBABILITY, 1975, 3 (01) :146-158
[8]   Improving Quality of Electronic Health Records with SNOMED [J].
Duarte, Julio ;
Castro, Sara ;
Santos, Manuel ;
Abelha, Antonio ;
Machado, Jose .
CENTERIS 2014 - CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / PROJMAN 2014 - INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / HCIST 2014 - INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, 2014, 16 :1342-1350
[9]   MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed [J].
Ebeid, Islam Akef .
FRONTIERS IN BIG DATA, 2022, 5
[10]   Knowledge graph embedding methods for entity alignment: experimental review [J].
Fanourakis, Nikolaos ;
Efthymiou, Vasilis ;
Kotzinos, Dimitris ;
Christophides, Vassilis .
DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (05) :2070-2137