Neuro-symbolic representation learning on biological knowledge graphs

被引:95
作者
Alshahrani, Mona [1 ]
Khan, Mohammad Asif [1 ]
Maddouri, Omar [1 ,2 ]
Kinjo, Akira R. [3 ]
Queralt-Rosinach, Nuria [4 ]
Hoehndorf, Robert [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Computat Biosci Res Ctr, Comp Elect & Math Sci & Engn Div, Thuwal 239556900, Saudi Arabia
[2] Hamad Bin Khalifa Univ, Coll Sci & Engn, Life Sci Div, Doha, Qatar
[3] Osaka Univ, Inst Prot Res, 3-2 Yamadaoka, Suita, Osaka 5650871, Japan
[4] Scripps Res Inst, Dept Integrat Struct & Computat Biol, La Jolla, CA 92037 USA
基金
欧盟第七框架计划;
关键词
ONTOLOGY; INFORMATION; DISEASE; OWL;
D O I
10.1093/bioinformatics/btx275
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics.
引用
收藏
页码:2723 / 2730
页数:8
相关论文
共 53 条
[1]  
[Anonymous], 2013, Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data
[2]  
[Anonymous], 2003, DESCRIPTION LOGIC HD
[3]  
[Anonymous], 2008, W3C RECOMMENDATION
[4]  
[Anonymous], W3C RECOMMENDATION
[5]  
[Anonymous], LECT NOTES COMPUTER
[6]  
[Anonymous], 2012, TECHNICAL REPORT
[7]  
[Anonymous], TECHNICAL REPORT
[8]  
[Anonymous], 2015, Nucleic Acids Res, V43, pD1049
[9]  
[Anonymous], 2014, PROC 20 ACM SIGKDD, DOI DOI 10.1145/2623330.2623732
[10]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29