Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data

被引:0
作者
Ji Hwan Park
Han Eol Cho
Jong Hun Kim
Melanie M. Wall
Yaakov Stern
Hyunsun Lim
Shinjae Yoo
Hyoung Seop Kim
Jiook Cha
机构
[1] Brookhaven National Laboratory,Computational Science Initiative
[2] Yonsei University College of Medicine,Department of Rehabilitation Medicine, Gangnam Severance Hospital and Rehabilitation Institute of Neuromuscular Disease
[3] National Health Insurance Service Ilsan Hospital,Department of Neurology, Dementia Center
[4] Columbia University,Department of Psychiatry, Vagelos College of Physicians and Surgeons
[5] Columbia University,Department of Neurology, Vagelos College of Physicians and Surgeons
[6] National Health Insurance Service Ilsan Hospital,Research and Analysis Team
[7] Dementia Center,Department of Physical Medicine and Rehabilitation
[8] National Health Insurance Service Ilsan Hospital,Department of Psychology
[9] Seoul National University,Department of Brain & Cognitive Sciences
[10] Seoul National University,Graduate School of Data Science
[11] Seoul National University,undefined
来源
npj Digital Medicine | / 3卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: “definite AD” with diagnostic codes and dementia medication (n = 614) and “probable AD” with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
引用
收藏
相关论文
共 92 条
[1]  
Brookmeyer R(1998)Projections of Alzheimer’s disease in the United States and the public health impact of delaying disease onset Am. J. Public Health 88 1337-1342
[2]  
Gray S(2013)Monetary costs of dementia in the United States N. Engl. J. Med. 368 1326-1334
[3]  
Kawas C(2014)The value of delaying Alzheimer’s disease onset Forum Health Econ. Policy 18 25-39
[4]  
Hurd MD(2014)Big data analytics in healthcare: promise and potential Health Inf. Sci. Syst. 2 3-741
[5]  
Martorell P(2006)Risk score for the prediction of dementia risk in 20 years among middle aged people: a longitudinal, population-based study Lancet Neurol. 5 735-326
[6]  
Delavande A(2010)Dementia risk prediction in the population: are screening models accurate? Nat. Rev. Neurol. 6 318-204
[7]  
Mullen KJ(2004)Multiple cognitive deficits during the transition to Alzheimer’s disease J. Intern Med. 256 195-455
[8]  
Langa KM(2005)Cognitive deficits 3 to 6 years before dementia onset in a population sample: the Honolulu-Asia aging study J. Am. Geriatr. Soc. 53 452-99
[9]  
Zissimopoulos J(2018)Prediction models to identify individuals at risk of metabolic syndrome who are unlikely to participate in a health intervention program Int. J. Med. Inform. 111 90-14
[10]  
Crimmins E(2018)Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea J. Affect. Disord. 231 8-125