Medication information extraction with linguistic pattern matching and semantic rules

被引：43

作者：

Spasic, Irene ^{[1
]}

Sarafraz, Farzaneh ^{[2
]}

Keane, John A. ^{[2
]}

Nenadic, Goran ^{[2
]}

机构：

[1] Cardiff Univ, Cardiff Sch Comp Sci & Informat, Cardiff CF24 3AA, S Glam, Wales

[2] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2010年 / 17卷 / 05期

基金：

英国生物技术与生命科学研究理事会;

关键词：

D O I：

10.1136/jamia.2010.003657

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing for Clinical Data, whose aim was to automatically extract certain information about medications used by a patient from his/her medical report. The aim was to extract the following information for each medication: name, dosage, mode/route, frequency, duration and reason. Design The system implements a rule-based methodology, which exploits typical morphological, lexical, syntactic and semantic features of the targeted information. These features were acquired from the training dataset and public resources such as the UMLS and relevant web pages. Information extracted by pattern matching was combined together using context-sensitive heuristic rules. Measurements The system was applied to a set of 547 previously unseen discharge summaries, and the extracted information was evaluated against a manually prepared gold standard consisting of 251 documents. The overall ranking of the participating teams was obtained using the micro-averaged F-measure as the primary evaluation metric. Results The implemented method achieved the micro-averaged F-measure of 81% (with 86% precision and 77% recall), which ranked this system third in the challenge. The significance tests revealed the system's performance to be not significantly different from that of the second ranked system. Relative to other systems, this system achieved the best F-measure for the extraction of duration (53%) and reason (46%). Conclusion Based on the F-measure, the performance achieved (81%) was in line with the initial agreement between human annotators (82%), indicating that such a system may greatly facilitate the process of extracting relevant information from medical records by providing a solid basis for a manual review process.

引用

页码：532 / 535

页数：4

共 9 条

[1]

[Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556

[2] The Unified Medical Language System (UMLS): integrating biomedical terminology [J].

Bodenreider, O .

NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270

[3] Assessing the consistency of a biomedical terminology through lexical knowledge [J].

Bodenreider, O ;

Burgun, A ;

Rindflesch, TC .

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) :85-95

[4]

COHEN WW, MINORTHIRD METHODS I

[5] Information extraction [J].

Cowie, J ;

Lehnert, W .

COMMUNICATIONS OF THE ACM, 1996, 39 (01) :80-91

[6]

TSURUOKA Y, 2005, P 9 INT WORKSH PARS, P133

[7]

Tsuruoka Y., 2005, P C HUM LANG TECHN E, P467, DOI [10.3115/1220575.1220634, DOI 10.3115/1220575.1220634]

[8] Extracting medication information from clinical text [J].

Uzuner, Oezlem ;

Solti, Imre ;

Cadag, Eithon .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (05) :514-518

[9] A Text Mining Approach to the Prediction of Disease Status from Clinical Discharge Summaries [J].

Yang, Hui ;

Spasic, Irena ;

Keane, John A. ;

Nenadic, Goran .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (04) :596-600

← 1 →