Automatic extraction of numerical values from unstructured data in EHRs

被引:3
作者
Bigeard, Elise [1 ]
Jouhet, Vianney [2 ]
Mougin, Fleur [2 ]
Thiessard, Frantz [2 ]
Grabar, Natalia [1 ]
机构
[1] Univ Lille 3, CNRS, UMR 8163, STL, Villeneuve Dascq, France
[2] Univ Bordeaux, INSERM, U897, ERIAS,ISPED, Bordeaux, France
来源
DIGITAL HEALTHCARE EMPOWERING EUROPEANS | 2015年 / 210卷
关键词
Natural Language Processing; Text Mining; Software Design; Information Storage and retrieval; France; CLINICAL-DATA; TEXT;
D O I
10.3233/978-1-61499-512-8-50
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clinical data recorded in modern EHRs are very rich, although their secondary use research and medical decision may be complicated (eg, missing and incorrect data, data spread over several clinical databases, information available only within unstructured narrative documents). We propose to address the issue related to the processing of narrative documents in order to detect and extract numerical values and to associate them with the corresponding concepts (or themes) and units. We propose to use a CRF supervised categorisation for the detection of segments (themes, numerical sequences and units) and a rules-based system for the association of these segments among them in order to build semantically meaningful sequences. The average results obtained are competitive (0.96 precision, 0.78 recall, and 0.86 F-measure) and we plan to use the system with larger clinical data.
引用
收藏
页码:50 / 54
页数:5
相关论文
共 12 条
[1]  
Boussadi A, 2008, STUD HEALTH TECHNOL, V136, P145
[2]   New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1) [J].
Eisenhauer, E. A. ;
Therasse, P. ;
Bogaerts, J. ;
Schwartz, L. H. ;
Sargent, D. ;
Ford, R. ;
Dancey, J. ;
Arbuck, S. ;
Gwyther, S. ;
Mooney, M. ;
Rubinstein, L. ;
Shankar, L. ;
Dodd, L. ;
Kaplan, R. ;
Lacombe, D. ;
Verweij, J. .
EUROPEAN JOURNAL OF CANCER, 2009, 45 (02) :228-247
[3]   Text mining: powering the database revolution [J].
Hahn, Udo ;
Wermter, Joachim ;
Blasczyk, Rainer ;
Horn, Peter A. .
NATURE, 2007, 448 (7150) :130-130
[4]  
Kerr K., 2007, DATA QUALITY INFORM
[5]  
Lavergne T, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P504
[6]   The OECD Health Care Quality Indicators Project: History and background [J].
Mattke, Soeren ;
Epstein, Arnold M. ;
Leatherman, Sheila .
INTERNATIONAL JOURNAL FOR QUALITY IN HEALTH CARE, 2006, 18 :1-4
[7]   Facts from text - Is text mining ready to deliver? [J].
Rebholz-Schuhmann, D ;
Kirsch, H ;
Couto, F .
PLOS BIOLOGY, 2005, 3 (02) :188-191
[8]  
Schmid H, 1995, P INT C NEW METH LAN, P44
[9]  
Verma R, 2001, J Healthc Inf Manag, V15, P107
[10]   Secondary Use of Clinical Data in Healthcare Providers - an Overview on Research, Regulatory and Ethical Requirements [J].
Wiesenauer, Matthias ;
Johner, Christian ;
Roehrig, Rainer .
QUALITY OF LIFE THROUGH QUALITY OF INFORMATION, 2012, 180 :614-618