Constructing biomedical domain-specific knowledge graph with minimum supervision

被引:42
作者
Yuan, Jianbo [1 ]
Jin, Zhiwei [2 ]
Guo, Han [2 ]
Jin, Hongxia [3 ]
Zhang, Xianchao [4 ]
Smith, Tristram [5 ]
Luo, Jiebo [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14623 USA
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Samsung Res Amer, Mountain View, CA USA
[4] Dalian Univ Technol, Sch Software Technol, Dalian, Peoples R China
[5] Univ Rochester, Med Ctr, Dept Pediat, Rochester, NY 14642 USA
基金
美国国家科学基金会;
关键词
Knowledge graph construction; Biomedical; Domain-specific; Minimum supervision;
D O I
10.1007/s10115-019-01351-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain-specific knowledge graph is an effective way to represent complex domain knowledge in a structured format and has shown great success in real-world applications. Most existing work on knowledge graph construction and completion shares several limitations in that sufficient external resources such as large-scale knowledge graphs and concept ontologies are required as the starting point. However, such extensive domain-specific labeling is highly time-consuming and requires special expertise, especially in biomedical domains. Therefore, knowledge extraction from unstructured contexts with minimum supervision is crucial in biomedical fields. In this paper, we propose a versatile approach for knowledge graph construction with minimum supervision based on unstructured biomedical domain-specific contexts including the steps of entity recognition, unsupervised entity and relation embedding, latent relation generation via clustering, relation refinement and relation assignment to assign cluster-level labels. The experimental results based on 24,687 unstructured biomedical science abstracts show that the proposed framework can effectively extract 16,192 structured facts with high precision. Moreover, we demonstrate that the constructed knowledge graph is a sufficient resource for the task of knowledge graph completion and new knowledge inference from unseen contexts.
引用
收藏
页码:317 / 336
页数:20
相关论文
共 39 条
[1]  
Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
[2]  
[Anonymous], ARXIV161008763
[3]  
[Anonymous], 2016, T ASSOC COMPUT LING, DOI DOI 10.1162/TACL_A_00051
[4]  
[Anonymous], CIDR C 7 BIENN C INN
[5]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[6]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[7]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[8]   ArrayExpress update - from bulk to single-cell expression data [J].
Athar, Awais ;
Fullgrabe, Anja ;
George, Nancy ;
Iqbal, Haider ;
Huerta, Laura ;
Ali, Ahmed ;
Snow, Catherine ;
Fonseca, Nuno A. ;
Petryszak, Robert ;
Papatheodorou, Irene ;
Sarkans, Ugis ;
Brazma, Alvis .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D711-D715
[9]  
Augenstein Isabelle., 2015, P 2015 C EMPIRICAL M, P747, DOI 10.18653/v1/D15-1086
[10]   A method for exploring implicit concept relatedness in biomedical knowledge network [J].
Bai, Tian ;
Gong, Leiguang ;
Wang, Ye ;
Wang, Yan ;
Kulikowski, Casimir A. ;
Huang, Lan .
BMC BIOINFORMATICS, 2016, 17