Epidemiological predictive modeling: lessons learned from the Kuopio ischemic heart disease risk factor study

被引：2

作者：

Brester, Christina ^{[1
]}

Voutilainen, Ari ^{[2
]}

Tuomainen, Tomi-Pekka ^{[2
]}

Kauhanen, Jussi ^{[2
]}

Kolehmainen, Mikko ^{[1
]}

机构：

[1] Univ Eastern Finland, Dept Environm & Biol Sci, Yliopistonranta 1 E,POB 1627, FI-70211 Kuopio, Finland

[2] Univ Eastern Finland, Inst Publ Hlth & Clin Nutr, Kuopio, Finland

来源：

ANNALS OF EPIDEMIOLOGY | 2022年 / 70卷

关键词：

Machine learning; Prediction of cardiovascular death; Population study; Epidemiology; BIG DATA;

D O I：

10.1016/j.annepidem.2022.03.010

中图分类号：

R1 [预防医学、卫生学];

学科分类号：

1004 ; 120402 ;

摘要：

Purpose: The use of predictive models in epidemiology is relatively narrow as most of the studies report results of traditional statistical models such as Linear, Logistic, or Cox regressions. In this study, a highdimensional epidemiological cohort, collected within the Kuopio Ischemic Heart Disease Risk Factor Study in 1984-1989, was used to investigate the predictive ability of models with embedded variable selection. Methods: Simple Logistic Regression with seven preselected risk factors was compared to k-Nearest Neighbors, Logistic Lasso Regression, Decision Tree, Random Forest, and Multilayer Perceptron in predicting cardiovascular death for the aged men from Kuopio Ischemic Heart Disease Risk Factor for the long horizon of 30 +/- 3 years: 746 predictor variables were available for 2682 men (705 cardiovascular deaths were registered). We considered two scenarios of handling competing risks (removing subjects and treating them as non-cases). Results: The best average AUC on the test sample was 0.8075 (95%CI, 0.8051-0.8099) in scenario 1 and 0.7155 (95%CI, 0.7128-0.7183) in scenario 2 achieved with Logistic Lasso Regression, which was 6.04% and 5.50% higher than the baseline AUC provided by Logistic Regression with manually preselected predictors. Conclusions: In both scenarios Logistic Lasso Regression, Random Forest, and Multilayer Perceptron outperformed Simple Logistic Regression.

引用

页码：1 / 8

页数：8

共 47 条

[1] Big data in digital healthcare: lessons learnt and recommendations for general practice
Agrawal, Raag
Prabakaran, Sudhakaran
[J]. HEREDITY, 2020, 124 (04) : 525 - 534
[2] Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants
Alaa, Ahmed M.
Bolton, Thomas
Di Angelantonio, Emanuele
Rudd, James H. F.
van der Schaar, Mihaela
[J]. PLOS ONE, 2019, 14 (05):
[3] Multiple imputation by chained equations: what is it and how does it work?
Azur, Melissa J.
Stuart, Elizabeth A.
Frangakis, Constantine
Leaf, Philip J.
[J]. INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2011, 20 (01) : 40 - 49
[4] Translating Artificial Intelligence Into Clinical Care
Beam, Andrew L.
Kohane, Isaac S.
[J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22): : 2368 - 2369
[5] Clinical prediction models: a fashion or a necessity in medicine?
Bernard, Alain
[J]. JOURNAL OF THORACIC DISEASE, 2017, 9 (10) : 3456 - 3457
[6] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[7] Statistical modeling: The two cultures
Breiman, L
[J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
[8] Brester C., PREDICTING RISK CARD
[9] Cox DR, 2001, STAT SCI, V16, P216
[10] Big data in healthcare: management, analysis and future prospects
Dash, Sabyasachi
Shakyawar, Sushil Kumar
Sharma, Mohit
Kaushik, Sandeep
[J]. JOURNAL OF BIG DATA, 2019, 6 (01)

← 1 2 3 4 5 →