The Evolution of Boosting Algorithms From Machine Learning to Statistical Modelling

被引:239
作者
Mayr, A. [1 ]
Binder, H. [2 ]
Gefeller, O. [1 ]
Schmid, M. [1 ,3 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg FAU, Inst Medizininformat Biomet & Epidemiol, D-91054 Erlangen, Germany
[2] Johannes Gutenberg Univ Mainz, Inst Med Biomet Epidemiol & Informat, Mainz, Germany
[3] Univ Bonn, Inst Med Biomet Informat & Epidemio, Bonn, Germany
关键词
Statistical computing; statistical models; algorithms; classification; machine learning; EVIDENCE CONTRARY; REGRESSION; SELECTION; VIEW; CLASSIFICATION; REGULARIZATION; PREDICTION;
D O I
10.3414/ME13-01-0122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. Nowadays, boosting algorithms are often applied to estimate and select predictor effects in statistical regression models. Objectives: This review article attempts to highlight the evolution of boosting algorithms from machine learning to statistical modelling. Methods: We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting for statistical modelling. We highlight the methodological background and present the most common software implementations. Results: Although gradient boosting and likelihood-based boosting are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, they result in statistical models with a straight-forward interpretation. Conclusions: Statistical boosting algorithms have gained substantial interest during the last decade and offer a variety of options to address important research questions in modern biomedicine.
引用
收藏
页码:419 / 427
页数:9
相关论文
共 56 条
[21]   Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression [J].
Fenske, Nora ;
Kneib, Thomas ;
Hothorn, Torsten .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :494-510
[22]  
Freund Y., 1990, Proceedings of the Third Annual Workshop on Computational Learning Theory, P202
[23]  
Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
[24]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[25]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[26]   Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting [J].
Groll, A. ;
Tutz, G. .
METHODS OF INFORMATION IN MEDICINE, 2012, 51 (02) :168-177
[27]  
Grove AJ, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P692
[28]   Model selection and the principle of minimum description length [J].
Hansen, MH ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (454) :746-774
[29]  
Hastie T., 2009, ELEMENTS STAT LEARNI, DOI 10.1007/978-0-387-84858-7
[30]  
Hastie T., 1990, Gam: Generalized Additive Models