Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

被引:383
作者
Goldstein, Benjamin A. [1 ,2 ]
Navar, Ann Marie [2 ]
Carter, Rickey E. [3 ]
机构
[1] Duke Univ, Dept Biostat & Bioinformat, 2424 Erwin Rd,Suite 1104, Durham, NC 27705 USA
[2] Duke Clin Res Inst, Ctr Predict Med, Durham, NC 27705 USA
[3] Mayo Clin, Div Biomed Stat & Informat, Dept Hlth Sci Res, Rochester, MN USA
关键词
Electronic health records; Risk prediction; Precision medicine; Personalized medicine; DISEASE; SELECTION; MODEL; SCORE; CLASSIFICATION; SIMULATION; ACCURACY; EVENTS;
D O I
10.1093/eurheartj/ehw302
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the sameway on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning.
引用
收藏
页码:1805 / 1814
页数:10
相关论文
共 48 条
[1]  
[Anonymous], 1991, Nearest neighbor (NN) norms: NN pattern classification techniques
[2]   Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes [J].
Austin, Peter C. ;
Tu, Jack V. ;
Ho, Jennifer E. ;
Levy, Daniel ;
Lee, Douglas S. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) :398-407
[3]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Relation of Black Race Between High Density Lipoprotein Cholesterol Content, High Density Lipoprotein Particles and Coronary Events (from the Dallas Heart Study) [J].
Chandra, Alvin ;
Neeland, Ian J. ;
Das, Sandeep R. ;
Khera, Amit ;
Turer, Asian T. ;
Ayers, Colby R. ;
McGuire, Darren K. ;
Rohatgi, Anand .
AMERICAN JOURNAL OF CARDIOLOGY, 2015, 115 (07) :890-894
[6]   A NEW METHOD OF CLASSIFYING PROGNOSTIC CO-MORBIDITY IN LONGITUDINAL-STUDIES - DEVELOPMENT AND VALIDATION [J].
CHARLSON, ME ;
POMPEI, P ;
ALES, KL ;
MACKENZIE, CR .
JOURNAL OF CHRONIC DISEASES, 1987, 40 (05) :373-383
[7]   Some recent statistical learning methods for longitudinal high-dimensional data [J].
Chen, Shuo ;
Grant, Edward ;
Wu, Tong Tong ;
Bowman, F. DuBois .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2014, 6 (01) :10-18
[8]  
Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.7326/M14-0697, 10.1038/bjc.2014.639, 10.1186/s12916-014-0241-z, 10.1136/bmj.g7594, 10.7326/M14-0698, 10.1016/j.jclinepi.2014.11.010, 10.1016/j.eururo.2014.11.025, 10.1002/bjs.9736]
[9]   Estimation of ten-year risk of fatal cardiovascular disease in Europe:: the SCORE project [J].
Conroy, RM ;
Pyörälä, K ;
Fitzgerald, AP ;
Sans, S ;
Menotti, A ;
De Backer, G ;
De Bacquer, D ;
Ducimetière, P ;
Jousilahti, P ;
Keil, U ;
Njolstad, I ;
Oganov, RG ;
Thomsen, T ;
Tunstall-Pedoe, H ;
Tverdal, A ;
Wedel, H ;
Whincup, P ;
Wilhelmsen, L ;
Graham, IM .
EUROPEAN HEART JOURNAL, 2003, 24 (11) :987-1003
[10]   Relationships between body mass index, cardiovascular mortality, and risk factors: a report from the SCORE investigators [J].
Dudina, Alexandra ;
Cooney, Marie Therese ;
De Bacquer, Dirk ;
De Backer, Guy ;
Ducimetiere, Pierre ;
Jousilahti, Pekka ;
Keil, Ulrich ;
Menotti, Alessandro ;
Njolstad, Inger ;
Oganov, Rafael ;
Sans, Susana ;
Thomsen, Troels ;
Tverdal, Aage ;
Wedel, Hans ;
Whincup, Peter ;
Wilhelmsen, Lars ;
Conroy, Ronan ;
Fitzgerald, Anthony ;
Graham, Ian .
EUROPEAN JOURNAL OF CARDIOVASCULAR PREVENTION & REHABILITATION, 2011, 18 (05) :731-742