Constructing biomedical domain-specific knowledge graph with minimum supervision

被引:44
作者
Yuan, Jianbo [1 ]
Jin, Zhiwei [2 ]
Guo, Han [2 ]
Jin, Hongxia [3 ]
Zhang, Xianchao [4 ]
Smith, Tristram [5 ]
Luo, Jiebo [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14623 USA
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Samsung Res Amer, Mountain View, CA USA
[4] Dalian Univ Technol, Sch Software Technol, Dalian, Peoples R China
[5] Univ Rochester, Med Ctr, Dept Pediat, Rochester, NY 14642 USA
基金
美国国家科学基金会;
关键词
Knowledge graph construction; Biomedical; Domain-specific; Minimum supervision;
D O I
10.1007/s10115-019-01351-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain-specific knowledge graph is an effective way to represent complex domain knowledge in a structured format and has shown great success in real-world applications. Most existing work on knowledge graph construction and completion shares several limitations in that sufficient external resources such as large-scale knowledge graphs and concept ontologies are required as the starting point. However, such extensive domain-specific labeling is highly time-consuming and requires special expertise, especially in biomedical domains. Therefore, knowledge extraction from unstructured contexts with minimum supervision is crucial in biomedical fields. In this paper, we propose a versatile approach for knowledge graph construction with minimum supervision based on unstructured biomedical domain-specific contexts including the steps of entity recognition, unsupervised entity and relation embedding, latent relation generation via clustering, relation refinement and relation assignment to assign cluster-level labels. The experimental results based on 24,687 unstructured biomedical science abstracts show that the proposed framework can effectively extract 16,192 structured facts with high precision. Moreover, we demonstrate that the constructed knowledge graph is a sufficient resource for the task of knowledge graph completion and new knowledge inference from unseen contexts.
引用
收藏
页码:317 / 336
页数:20
相关论文
共 39 条
[11]   Bio2RDF: Towards a mashup to build bioinformatics knowledge systems [J].
Belleau, Francois ;
Nolin, Marc-Alexandre ;
Tourigny, Nicole ;
Rigault, Philippe ;
Morissette, Jean .
JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (05) :706-716
[12]  
Bollacker Kurt, 2008, P 2008 ACM SIGMOD IN, P1247, DOI DOI 10.1145/1376616.1376746
[13]  
Bordes A., 2013, P ANN C NEUR INF PRO, P2787, DOI DOI 10.5555/2999792.2999923
[14]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[15]  
Ernst P, 2016, PROCEEDINGS OF 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL-2016): SYSTEM DEMONSTRATIONS, P19
[16]   KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences [J].
Ernst, Patrick ;
Siu, Amy ;
Weikum, Gerhard .
BMC BIOINFORMATICS, 2015, 16
[17]  
Finkel J. R., 2005, P 43 ANN M ASS COMP, P363, DOI DOI 10.3115/1219840.1219885
[18]  
Galarraga Luis, 2014, CIKM, P1679
[19]  
Hoffmann Raphael, 2011, P 49 ANN M ASS COMP, P541
[20]   Mining strong relevance between heterogeneous entities from unstructured biomedical data [J].
Ji, Ming ;
He, Qi ;
Han, Jiawei ;
Spangler, Scott .
DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (04) :976-998