Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010

被引：124

作者：

de Bruijn, Berry ^{[1
]}

Cherry, Colin ^{[1
]}

Kiritchenko, Svetlana ^{[1
]}

Martin, Joel ^{[1
]}

Zhu, Xiaodan ^{[1
]}

机构：

[1] Natl Res Council Canada, Inst Informat Technol, Ottawa, ON K1A 0R6, Canada

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2011年 / 18卷 / 05期

关键词：

D O I：

10.1136/amiajnl-2011-000150

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. Design The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. Measurements Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. Results The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). Conclusion For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.

引用

页码：557 / 562

页数：6

共 16 条

[1] An overview of MetaMap: historical perspective and recent advances
Aronson, Alan R.
Lang, Francois-Michel
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) : 229 - 236
[2] Brown P. F., 1992, Computational Linguistics, V18, P467
[3] LIBSVM: A Library for Support Vector Machines
Chang, Chih-Chung
Lin, Chih-Jen
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4] Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1
[5] Crammer K, 2006, J MACH LEARN RES, V7, P551
[6] De Marneffe M.-C., 2006, Linguistics in the Netherlands, V6, P449, DOI 10.1.1.74.3875
[7] ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports
Harkema, Henk
Dowling, John N.
Thornblade, Tyler
Chapman, Wendy W.
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 839 - 851
[8] JOACHIMS T, 2010, ADV KERNEL METHOD SU
[9] McClosky D., 2009, ANY DOMAIN PARSING A
[10] McClosky David, 2006, NAACL, DOI DOI 10.3115/1220835.1220855

← 1 2 →