Extracting Information from Electronic Medical Records to Identify Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures

被引:0
作者
Figueroa, Rosa L. [1 ]
Flores, Christopher A. [1 ]
机构
[1] Univ Concepcion, Fac Ingn, Dept Ingn Elect, Concepcion, Chile
来源
AMBIENT INTELLIGENCE FOR HEALTH, AMIHEALTH 2015 | 2015年 / 9456卷
关键词
Machine learning; Natural language processing; Obesity; Comorbidities;
D O I
10.1007/978-3-319-26508-7_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method to identify obesity using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 2412 de-identified medical records that contains labels for two classification problems. The first classification problem recognizes between obesity, overweight, normal weight, and underweight. The second problem of classification corresponds to the obesity types under the obesity category to recognize between super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representation of the features. We used Support Vector Machine and NaIve Bayes together with ten-fold cross validation to evaluate and compare performances. In general, our results show that Support Vector Machine obtains better performances than Naive Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
引用
收藏
页码:37 / 46
页数:10
相关论文
共 13 条
[1]  
[Anonymous], TECHNICAL REPORT NO
[2]  
Atalah S.E., 2012, Revista Medica Clinica Las Condes, V23, P117, DOI [10.1016/S0716-8640(12)70287-0, DOI 10.1016/S0716-8640(12)70287-0, 10.1016/s0716-8640(12)70287-0]
[3]   Prevalence of Obesity, Type II Diabetes Mellitus, Hyperlipidemia, and Hypertension in the United States: Findings from the GE Centricity Electronic Medical Record Database [J].
Crawford, Albert G. ;
Cote, Christine ;
Couto, Joseph ;
Daskiran, Mehmet ;
Gunnarsson, Candace ;
Haas, Kara ;
Haas, Sara ;
Nigam, Somesh C. ;
Schuette, Rob .
POPULATION HEALTH MANAGEMENT, 2010, 13 (03) :151-161
[4]  
Curtis Michael., 2004, J DEV SOCIAL TRANSFO, V1, P37
[5]  
Gebre B., 2013, Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, P216
[6]  
Holt B., 2013, ECML PKDD WORKSH LAN
[7]   Understanding the relation between obesity and depression: Causal mechanisms and implications for treatment [J].
Markowitz, Sarah ;
Friedman, Michael A. ;
Arent, Shawn M. .
CLINICAL PSYCHOLOGY-SCIENCE AND PRACTICE, 2008, 15 (01) :1-20
[8]  
Moreno G M, 2012, REV MED CLIN CONDES, V23, P124, DOI [DOI 10.1016/S0716-8640(12)70288-2, 10.1016/S0716-8640(12)70288-2]
[9]   Regular expression-based learning to extract bodyweight values from clinical notes [J].
Murtaugh, Maureen A. ;
Gibson, Bryan Smith ;
Redd, Doug ;
Zeng-Treitler, Qing .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 54 :186-190
[10]   Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier [J].
Solt, Illes ;
Tikk, Domonkos ;
Gal, Viktor ;
Kardkovacs, Zsolt T. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (04) :580-584