Epidemiological predictive modeling: lessons learned from the Kuopio ischemic heart disease risk factor study

被引:2
作者
Brester, Christina [1 ]
Voutilainen, Ari [2 ]
Tuomainen, Tomi-Pekka [2 ]
Kauhanen, Jussi [2 ]
Kolehmainen, Mikko [1 ]
机构
[1] Univ Eastern Finland, Dept Environm & Biol Sci, Yliopistonranta 1 E,POB 1627, FI-70211 Kuopio, Finland
[2] Univ Eastern Finland, Inst Publ Hlth & Clin Nutr, Kuopio, Finland
关键词
Machine learning; Prediction of cardiovascular death; Population study; Epidemiology; BIG DATA;
D O I
10.1016/j.annepidem.2022.03.010
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose: The use of predictive models in epidemiology is relatively narrow as most of the studies report results of traditional statistical models such as Linear, Logistic, or Cox regressions. In this study, a highdimensional epidemiological cohort, collected within the Kuopio Ischemic Heart Disease Risk Factor Study in 1984-1989, was used to investigate the predictive ability of models with embedded variable selection. Methods: Simple Logistic Regression with seven preselected risk factors was compared to k-Nearest Neighbors, Logistic Lasso Regression, Decision Tree, Random Forest, and Multilayer Perceptron in predicting cardiovascular death for the aged men from Kuopio Ischemic Heart Disease Risk Factor for the long horizon of 30 +/- 3 years: 746 predictor variables were available for 2682 men (705 cardiovascular deaths were registered). We considered two scenarios of handling competing risks (removing subjects and treating them as non-cases). Results: The best average AUC on the test sample was 0.8075 (95%CI, 0.8051-0.8099) in scenario 1 and 0.7155 (95%CI, 0.7128-0.7183) in scenario 2 achieved with Logistic Lasso Regression, which was 6.04% and 5.50% higher than the baseline AUC provided by Logistic Regression with manually preselected predictors. Conclusions: In both scenarios Logistic Lasso Regression, Random Forest, and Multilayer Perceptron outperformed Simple Logistic Regression.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 47 条
  • [1] Big data in digital healthcare: lessons learnt and recommendations for general practice
    Agrawal, Raag
    Prabakaran, Sudhakaran
    [J]. HEREDITY, 2020, 124 (04) : 525 - 534
  • [2] Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
    Alaa, Ahmed M.
    Bolton, Thomas
    Di Angelantonio, Emanuele
    Rudd, James H. F.
    van der Schaar, Mihaela
    [J]. PLOS ONE, 2019, 14 (05):
  • [3] Multiple imputation by chained equations: what is it and how does it work?
    Azur, Melissa J.
    Stuart, Elizabeth A.
    Frangakis, Constantine
    Leaf, Philip J.
    [J]. INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2011, 20 (01) : 40 - 49
  • [4] Translating Artificial Intelligence Into Clinical Care
    Beam, Andrew L.
    Kohane, Isaac S.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22): : 2368 - 2369
  • [5] Clinical prediction models: a fashion or a necessity in medicine?
    Bernard, Alain
    [J]. JOURNAL OF THORACIC DISEASE, 2017, 9 (10) : 3456 - 3457
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
  • [8] Brester C., PREDICTING RISK CARD
  • [9] Cox DR, 2001, STAT SCI, V16, P216
  • [10] Big data in healthcare: management, analysis and future prospects
    Dash, Sabyasachi
    Shakyawar, Sushil Kumar
    Sharma, Mohit
    Kaushik, Sandeep
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)