Super learner

被引:812
作者
van der Laan, Mark J. [1 ]
Polley, Eric C. [1 ]
Hubbard, Alan E. [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
cross-validation; loss-based estimation; machine learning; prediction; REGRESSION;
D O I
10.2202/1544-6115.1309
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.
引用
收藏
页数:22
相关论文
共 16 条
[1]  
[Anonymous], 2003, Technical Report 130
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Dudoit S., 2003, Statistical Methodol, V2, P131
[5]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[6]  
FREUND Y, 1997, 29 ANN ACM S THEOR C
[7]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[8]  
Hansen J.V., 1998, ICCIN
[9]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[10]  
RHEE J, 2006, P NATL ACAD SCI US