Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach

被引:20
作者
Barbieri, Sebastiano [1 ]
Mehta, Suneela [2 ]
Wu, Billy [2 ]
Bharat, Chrianna [3 ]
Poppe, Katrina [2 ]
Jorm, Louisa [1 ]
Jackson, Rod [2 ]
机构
[1] Univ New South Wales, Ctr Big Data Res Hlth, Sydney, NSW, Australia
[2] Univ Auckland, Sect Epidemiol & Biostat, Auckland, New Zealand
[3] Univ New South Wales, Natl Drug & Alcohol Res Ctr, Sydney, NSW, Australia
关键词
Cardiovascular diseases; primary prevention; risk assessment; population health; health planning; machine learning; deep learning; survival analysis;
D O I
10.1093/ije/dyab258
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background Machine learning-based risk prediction models may outperform traditional statistical models in large datasets with many variables, by identifying both novel predictors and the complex interactions between them. This study compared deep learning extensions of survival analysis models with Cox proportional hazards models for predicting cardiovascular disease (CVD) risk in national health administrative datasets. Methods Using individual person linkage of administrative datasets, we constructed a cohort of all New Zealanders aged 30-74 who interacted with public health services during 2012. After excluding people with prior CVD, we developed sex-specific deep learning and Cox proportional hazards models to estimate the risk of CVD events within 5 years. Models were compared based on the proportion of explained variance, model calibration and discrimination, and hazard ratios for predictor variables. Results First CVD events occurred in 61 927 of 2 164 872 people. Within the reference group, the largest hazard ratios estimated by the deep learning models were for tobacco use in women (2.04, 95% CI: 1.99, 2.10) and chronic obstructive pulmonary disease with acute lower respiratory infection in men (1.56, 95% CI: 1.50, 1.62). Other identified predictors (e.g. hypertension, chest pain, diabetes) aligned with current knowledge about CVD risk factors. Deep learning outperformed Cox proportional hazards models on the basis of proportion of explained variance (R-2: 0.468 vs 0.425 in women and 0.383 vs 0.348 in men), calibration and discrimination (all P <0.0001). Conclusions Deep learning extensions of survival analysis models can be applied to large health administrative datasets to derive interpretable CVD risk prediction equations that are more accurate than traditional Cox proportional hazards models.
引用
收藏
页码:933 / 944
页数:12
相关论文
共 30 条
  • [1] Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
    Alaa, Ahmed M.
    Bolton, Thomas
    Di Angelantonio, Emanuele
    Rudd, James H. F.
    van der Schaar, Mihaela
    [J]. PLOS ONE, 2019, 14 (05):
  • [2] Combined 5 x 2 cv F test for comparing supervised classification learning algorithms
    Alpaydin, E
    [J]. NEURAL COMPUTATION, 1999, 11 (08) : 1885 - 1892
  • [3] Cardiovascular Event Prediction by Machine Learning The Multi-Ethnic Study of Atherosclerosis
    Ambale-Venkatesh, Bharath
    Yang, Xiaoying
    Wu, Colin O.
    Liu, Kiang
    Hundley, W. Gregory
    McClelland, Robyn
    Gomes, Antoinette S.
    Folsom, Aaron R.
    Shea, Steven
    Guallar, Eliseo
    Bluemke, David A.
    Lima, Joao A. C.
    [J]. CIRCULATION RESEARCH, 2017, 121 (09) : 1092 - +
  • [4] [Anonymous], 2013, NZ PRIM CAR HDB 2012
  • [5] [Anonymous], 2009, NAT HLTH IND DAT DIC
  • [6] SEPARATING THE BRIER SCORE INTO CALIBRATION AND REFINEMENT COMPONENTS - A GRAPHICAL EXPOSITION
    BLATTENBERGER, G
    LAD, F
    [J]. AMERICAN STATISTICIAN, 1985, 39 (01) : 26 - 32
  • [7] Breslow N.E., 1972, J R STAT SOC B, V34, P202, DOI [10.1111/j.2517-6161.1972.tb00900.x, DOI 10.1111/J.2517-6161.1972.TB00900.X]
  • [8] POINTS OF SIGNIFICANCE Statistics versus machine learning
    Bzdok, Danilo
    Altman, Naomi
    Krzywinski, Martin
    [J]. NATURE METHODS, 2018, 15 (04) : 232 - 233
  • [9] COX DR, 1972, J R STAT SOC B, V34, P187
  • [10] Prediction models for cardiovascular disease risk in the general population: systematic review
    Damen, Johanna A. A. G.
    Hooft, Lotty
    Schuit, Ewoud
    Debray, Thomas P. A.
    Collins, Gary S.
    Tzoulaki, Ioanna
    Lassale, Camille M.
    Siontis, George C. M.
    Chiocchia, Virginia
    Roberts, Corran
    Schlussel, Michael Maia
    Gerry, Stephen
    Black, James A.
    Heus, Pauline
    van der Schouw, Yvonne T.
    Peelen, Linda M.
    Moons, Karel G. M.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2016, 353