A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
来源
ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING | 2015年 / 358卷
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 11 条
[1]  
Collier N., 2014, P 5 INT WORKSH HLTH, P11
[2]  
Collier N, 2012, P 24 INT C COMP LING, P647
[3]   Learning to Recognize Phenotype Candidates in the Auto-Immune Literature Using SVM Re-Ranking [J].
Collier, Nigel ;
Mai-vu Tran ;
Hoang-quynh Le ;
Quang-Thuy Ha ;
Oellrich, Anika ;
Rebholz-Schuhmann, Dietrich .
PLOS ONE, 2013, 8 (10)
[4]  
Hamosh A, 2005, NUCLEIC ACIDS RES, V33, pD514
[5]  
Khordad M, 2011, LECT NOTES ARTIF INT, V6657, P246, DOI 10.1007/978-3-642-21043-3_30
[6]   Gene name identification and normalization using a model organism database [J].
Morgan, AA ;
Hirschman, L ;
Colosimo, M ;
Yeh, AS ;
Colombe, JB .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) :396-410
[7]  
Rebholz-Schuhmann Dietrich, 2010, Journal of Bioinformatics and Computational Biology, V8, P163, DOI 10.1142/S0219720010004562
[8]   The Human Phenotype Ontology: A Tool for Annotating and Analyzing Human Hereditary Disease [J].
Robinson, Peter N. ;
Koehler, Sebastian ;
Bauer, Sebastian ;
Seelow, Dominik ;
Horn, Denise ;
Mundlos, Stefan .
AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (05) :610-615
[9]  
Scheuermann Richard H, 2009, Summit Transl Bioinform, V2009, P116
[10]   The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information [J].
Smith, CL ;
Goldsmith, CAW ;
Eppig, JT .
GENOME BIOLOGY, 2005, 6 (01)