A comparison of penalised regression methods for informing the selection of predictive markers

被引:41
作者
Greenwood, Christopher J. [1 ,2 ]
Youssef, George J. [1 ,2 ]
Letcher, Primrose [3 ]
Macdonald, Jacqui A. [1 ,2 ]
Hagg, Lauryn J. [1 ]
Sanson, Ann [3 ]
Mcintosh, Jenn [1 ,2 ]
Hutchinson, Delyse M. [1 ,2 ,3 ,4 ]
Toumbourou, John W. [1 ]
Fuller-Tyszkiewicz, Matthew [1 ]
Olsson, Craig A. [1 ,2 ,3 ]
机构
[1] Deakin Univ, Sch Psychol, Fac Hlth, Ctr Social & Early Emot Dev, Geelong, Vic, Australia
[2] Murdoch Childrens Res Inst, Ctr Adolescent Hlth, Melbourne, Vic, Australia
[3] Univ Melbourne, Royal Childrens Hosp, Dept Paediat, Melbourne, Vic, Australia
[4] Univ New South Wales, Natl Drug & Alcohol Res Ctr, Fac Med, Randwick, NSW, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会;
关键词
VARIABLE SELECTION; REGULARIZATION; DELINQUENCY; CHILDHOOD; INVENTORY; ANXIETY;
D O I
10.1371/journal.pone.0242730
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. Design Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. Findings Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. Conclusions Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.
引用
收藏
页数:14
相关论文
共 56 条
[1]   Machine-learning prediction of adolescent alcohol use: a cross-study, cross-cultural validation [J].
Afzali, Mohammad H. ;
Sunderland, Matthew ;
Stewart, Sherry ;
Masse, Benoit ;
Seguin, Jean ;
Newton, Nicola ;
Teesson, Maree ;
Conrod, Patricia .
ADDICTION, 2019, 114 (04) :662-671
[2]   Utility of Machine-learning approaches to identify Behavioral Markers for substance Use Disorders: impulsivity Dimensions as Predictors of current cocaine Dependence [J].
Ahn, Woo-Young ;
Ramesh, Divya ;
Moeller, Frederick Gerard ;
Vassileva, Jasmin .
FRONTIERS IN PSYCHIATRY, 2016, 7
[3]  
Ainley J., 1986, SCH ORG QUALITY SCH
[4]   Prognosis and prognostic research: validating a prognostic model [J].
Altman, Douglas G. ;
Vergouwe, Yvonne ;
Royston, Patrick ;
Moons, Karel G. M. .
BMJ-BRITISH MEDICAL JOURNAL, 2009, 338 :1432-1435
[5]  
Angold A, 1995, INT J METHOD PSYCH, V5, P237
[6]   THE INVENTORY OF PARENT AND PEER ATTACHMENT - INDIVIDUAL-DIFFERENCES AND THEIR RELATIONSHIP TO PSYCHOLOGICAL WELL-BEING IN ADOLESCENCE [J].
ARMSDEN, GC ;
GREENBERG, MT .
JOURNAL OF YOUTH AND ADOLESCENCE, 1987, 16 (05) :427-454
[7]  
Australian Institute of Health and Welfare, 2011, IMP CAUS ILLN DEATH
[8]  
Bates J. E., 1995, 1995 M MIDW PSYCH AS
[9]   Effectiveness of acute geriatric units on functional decline, living at home, and case fatality among older patients admitted to hospital for acute medical disorders: meta-analysis [J].
Baztan, Juan J. ;
Suarez-Garcia, Francisco M. ;
Lopez-Arrieta, Jesus ;
Rodriguez-Manas, Leocadio ;
Rodriguez-Artalejo, Fernando .
BMJ-BRITISH MEDICAL JOURNAL, 2009, 338 :334-336
[10]   Variable selection for multiply-imputed data with application to dioxin exposure study [J].
Chen, Qixuan ;
Wang, Sijian .
STATISTICS IN MEDICINE, 2013, 32 (21) :3646-3659