Using an ensemble system to improve concept extraction from clinical records

被引:24
作者
Kang, Ning [1 ]
Afzal, Zubair [1 ]
Singh, Bharat [1 ]
van Mulligen, Erik M. [1 ]
Mors, Jan A. [1 ]
机构
[1] Erasmus Univ, Med Ctr, Dept Med Informat, NL-3000 CA Rotterdam, Netherlands
关键词
Clinical record analysis; Natural language processing; Voting scheme; Ensemble system; INFORMATION EXTRACTION; ASSERTIONS; FUSION; TEXT; DOCUMENTS; UMLS;
D O I
10.1016/j.jbi.2011.12.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recognition of medical concepts is a basic step in information extraction from clinical records. We wished to improve on the performance of a variety of concept recognition systems by combining their individual results. We selected two dictionary-based systems and five statistical-based systems that were trained to annotate medical problems, tests, and treatments in clinical records. Manually annotated clinical records for training and testing were made available through the 2010 i2b2/VA (Informatics for Integrating Biology and the Bedside) challenge. Results of individual systems were combined by a simple voting scheme. The statistical systems were trained on a set of 349 records. Performance (precision, recall, F-score) was assessed on a test set of 477 records, using varying voting thresholds. The combined annotation system achieved a best F-score of 82.2% (recall 81.2%, precision 83.3%) on the test set, a score that ranks third among 22 participants in the i2b2/VA concept annotation task. The ensemble system had better precision and recall than any of the individual systems, yielding an F-score that is 4.6% point higher than the best single system. Changing the voting threshold offered a simple way to obtain a system with high precision (and moderate recall) or one with high recall (and moderate precision). The ensemble-based approach is straightforward and allows the balancing of precision versus recall of the combined system. The ensemble system is freely available and can easily be extended, integrated in other systems, and retrained. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:423 / 428
页数:6
相关论文
共 49 条
[1]   Use of Dempster-Shafer theory to combine classifiers which use different class boundaries [J].
Ahmadzadeh, MR ;
Petrou, M .
PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (01) :41-46
[2]   Experimental evaluation of expert fusion strategies [J].
Alkoot, FM ;
Kittler, J .
PATTERN RECOGNITION LETTERS, 1999, 20 (11-13) :1361-1369
[3]   On naive Bayesian fusion of dependent classifiers [J].
Altinçay, H .
PATTERN RECOGNITION LETTERS, 2005, 26 (15) :2463-2473
[4]  
[Anonymous], P 1 INT S SEM MIN BI
[5]  
[Anonymous], 2008, P WORKSH ENH INT LAR
[6]  
[Anonymous], 2001, PROC 18 INT C MACH L
[7]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[8]   Concept recognition for extracting protein interaction relations from biomedical text [J].
Baumgartner, William A., Jr. ;
Lu, Zhiyong ;
Johnson, Helen L. ;
Caporaso, J. Gregory ;
Paquette, Jesse ;
Lindemann, Anna ;
White, Elizabeth K. ;
Medvedeva, Olga ;
Cohen, K. Bretonnel ;
Hunter, Lawrence .
GENOME BIOLOGY, 2008, 9
[9]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[10]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270