Estimating disease prevalence from drug utilization data using the Random Forest algorithm

被引:20
作者
Slobbe, Laurentius C. J. [1 ,2 ]
Fussenich, Koen [1 ,3 ]
Wong, Albert [1 ]
Boshuizen, Hendriek C. [1 ,4 ]
Nielen, Markus M. J. [1 ,5 ]
Polder, Johan J. [1 ,2 ]
Feenstra, Talitha L. [1 ,3 ]
van Oers, Hans A. M. [1 ,2 ]
机构
[1] Natl Inst Publ Hlth & Environm RIVM, Bilthoven, Netherlands
[2] Tilburg Univ, Dept Tranzo, Tilburg, Netherlands
[3] Univ Groningen, Univ Med Ctr, Dept Epidemiol, Groningen, Netherlands
[4] Wageningen Univ & Res, Wageningen, Netherlands
[5] Netherlands Inst Hlth Serv Res NIVEL, Utrecht, Netherlands
关键词
DIABETES-MELLITUS; NATIONAL-HEALTH; GERMANY; TRENDS; CARE;
D O I
10.1093/eurpub/cky270
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: Aggregated claims data on medication are often used as a proxy for the prevalence of diseases, especially chronic diseases. However, linkage between medication and diagnosis tend to be theory based and not very precise. Modelling disease probability at an individual level using individual level data may yield more accurate results. Methods: Individual probabilities of having a certain chronic disease were estimated using the Random Forest (RF) algorithm. A training set was created from a general practitioners database of 276 723 cases that included diagnosis and claims data on medication. Model performance for 29 chronic diseases was evaluated using Receiver-Operator Curves, by measuring the Area Under the Curve (AUC). Results: The diseases for which model performance was best were Parkinson's disease (AUC = .89, 95% CI = .77-1.00), diabetes (AUC = .87, 95% CI = .85-.90), osteoporosis (AUC = .87, 95% CI = .81-.92) and heart failure (AUC = .81, 95% CI = .74-.88). Five other diseases had an AUC > .75: asthma, chronic enteritis, COPD, epilepsy and HIV/AIDS. For 16 of 17 diseases tested, the medication categories used in theory-based algorithms were also identified by our method, however the RF models included a broader range of medications as important predictors. Conclusion: Data on medication use can be a useful predictor when estimating the prevalence of several chronic diseases. To improve the estimates, for a broader range of chronic diseases, research should use better training data, include more details concerning dosages and duration of prescriptions, and add related predictors like hospitalizations.
引用
收藏
页码:615 / 621
页数:8
相关论文
共 36 条
[1]  
Bakker B.F., 2014, Stat. J. IAOS, V30, P411, DOI DOI 10.3233/SJI-140803
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Hospital discharge records under-report the prevalence of diabetes in inpatients [J].
Carral, F ;
Olveira, G ;
Aguilar, M ;
Ortego, J ;
Gavilan, I ;
Domenech, I ;
Escobar, L .
DIABETES RESEARCH AND CLINICAL PRACTICE, 2003, 59 (02) :145-151
[4]   THE PREVALENCE OF SELECTED PHYSICAL ACTIVITIES AND THEIR RELATION WITH CORONARY HEART-DISEASE RISK-FACTORS IN ELDERLY MEN - THE ZUTPHEN STUDY, 1985 [J].
CASPERSEN, CJ ;
BLOEMBERG, BPM ;
SARIS, WHM ;
MERRITT, RK ;
KROMHOUT, D .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1991, 133 (11) :1078-1092
[5]  
Chaudhry MR, 2015, PREDICTING INDIVIDUA
[6]   Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources [J].
Chini, Francesco ;
Pezzotti, Patrizio ;
Orzella, Letizia ;
Borgia, Piero ;
Guasticchi, Gabriella .
BMC PUBLIC HEALTH, 2011, 11
[7]   Trends in the Incidence of Parkinson Disease in the General Population [J].
Darweesh, Sirwan K. L. ;
Koudstaal, Peter J. ;
Stricker, Bruno H. ;
Hofman, Albert ;
Ikram, M. Arfan .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2016, 183 (11) :1018-1026
[8]   Prevalence and comorbidity of diabetes mellitus among non-institutionalized older adults in Germany - results of the national telephone health interview survey 'German Health Update (GEDA)' 2009 [J].
Du, Yong ;
Heidemann, Christin ;
Goesswald, Antje ;
Schmich, Patrick ;
Scheidt-Nave, Christa .
BMC PUBLIC HEALTH, 2013, 13
[9]   Estimating the prevalence of depression associated with healthcare use in France using administrative databases [J].
Filipovic-Pierucci, Antoine ;
Samson, Solene ;
Fagot, Jean-Paul ;
Fagot-Campagna, Anne .
BMC PSYCHIATRY, 2017, 17
[10]  
HERRETT E, 2013, BMJ-BRIT MED J, V346, DOI DOI 10.1136/BMJ.F2350