Explaining protein-protein interactions with knowledge graph-based semantic similarity

被引:3
作者
Sousa, Rita T. [1 ]
Silva, Sara [1 ]
Pesquita, Catia [1 ]
机构
[1] Univ Lisbon, LASIGE, Fac Ciencias, Lisbon, Portugal
关键词
Machine learning; Explainable artificial intelligence; Knowledge graph; Semantic similarity; Protein-protein interaction prediction;
D O I
10.1016/j.compbiomed.2024.108076
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The application of artificial intelligence and machine learning methods for several biomedical applications, such as protein-protein interaction prediction, has gained significant traction in recent decades. However, explainability is a key aspect of using machine learning as a tool for scientific discovery. Explainable artificial intelligence approaches help clarify algorithmic mechanisms and identify potential bias in the data. Given the complexity of the biomedical domain, explanations should be grounded in domain knowledge which can be achieved by using ontologies and knowledge graphs. These knowledge graphs express knowledge about a domain by capturing different perspectives of the representation of real -world entities. However, the most popular way to explore knowledge graphs with machine learning is through using embeddings, which are not explainable. As an alternative, knowledge graph -based semantic similarity offers the advantage of being explainable. Additionally, similarity can be computed to capture different semantic aspects within the knowledge graph and increasing the explainability of predictive approaches. We propose a novel method to generate explainable vector representations, KGsim2vec, that uses aspectoriented semantic similarity features to represent pairs of entities in a knowledge graph. Our approach employs a set of machine learning models, including decision trees, genetic programming, random forest and eXtreme gradient boosting, to predict relations between entities. The experiments reveal that considering multiple semantic aspects when representing the similarity between two entities improves explainability and predictive performance. KGsim2vec performs better than black -box methods based on knowledge graph embeddings or graph neural networks. Moreover, KGsim2vec produces global models that can capture biological phenomena and elucidate data biases.
引用
收藏
页数:14
相关论文
共 69 条
  • [1] Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions
    Abdelaziz, Ibrahim
    Fokoue, Achille
    Hassanzadeh, Oktie
    Zhang, Ping
    Sadoghi, Mohammad
    [J]. JOURNAL OF WEB SEMANTICS, 2017, 44 : 104 - 117
  • [2] Anguita-Ruiz A, 2020, PLOS COMPUT BIOL, V16, DOI [10.1371/journal.pcbi.1007792, 10.1371/journal.pcbi.1007792.r001, 10.1371/journal.pcbi.1007792.r002, 10.1371/journal.pcbi.1007792.r003, 10.1371/journal.pcbi.1007792.r004]
  • [3] Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology
    Asif, Muhammad
    Martiniano, Hugo F. M. C. M.
    Vicente, Astrid M.
    Couto, Francisco M.
    [J]. PLOS ONE, 2018, 13 (12):
  • [4] A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction
    Bandyopadhyay, Sanghamitra
    Mallick, Koushik
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (04) : 762 - 770
  • [5] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Barredo Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    Bennetot, Adrien
    Tabik, Siham
    Barbado, Alberto
    Garcia, Salvador
    Gil-Lopez, Sergio
    Molina, Daniel
    Benjamins, Richard
    Chatila, Raja
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2020, 58 : 82 - 115
  • [6] Bianchi F, 2020, STUD SEMANTIC WEB, V47, P49, DOI 10.3233/SSW200011
  • [7] Bordes A., 2013, ADV NEURAL INFORM PR, V2013, P2787, DOI DOI 10.5555/2999792.2999923
  • [8] GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression
    Bourgeais, Victoria
    Zehraoui, Farida
    Hanczar, Blaise
    [J]. BIOINFORMATICS, 2022, 38 (09) : 2504 - 2511
  • [9] ProteinBERT: a universal deep-learning model of protein sequence and function
    Brandes, Nadav
    Ofer, Dan
    Peleg, Yam
    Rappoport, Nadav
    Linial, Michal
    [J]. BIOINFORMATICS, 2022, 38 (08) : 2102 - 2110
  • [10] A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications
    Cai, HongYun
    Zheng, Vincent W.
    Chang, Kevin Chen-Chuan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (09) : 1616 - 1637