Ensemble methods and partial least squares regression

被引:72
作者
Mevik, BH [1 ]
Segtnan, VH [1 ]
Næs, T [1 ]
机构
[1] Matforsk, N-1430 As, Norway
关键词
ensemble methods; bootstrap aggregating (bagging); data augmentation; noise addition; partial least squares regression (PLSR);
D O I
10.1002/cem.895
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, there has been increased attention in the literature on the use of ensemble methods in multivariate regression and classification. These methods have been shown to have interesting properties for both regression and classification. In particular, they can improve the accuracy of unstable predictors. Ensemble methods have so far been little studied in situations that are common for calibration and prediction in chemistry, i.e. situations with a large number of collinear x-variables and few samples. These situations are often approached by data compression methods such as principal component regression (PCR) or partial least squares regression (PLSR). The present paper is an investigation of the properties of different types of ensemble methods used with PLSR in situations with highly collinear x-data. Bagging and data augmentation by simulated noise are studied. The focus is on the robustness of the calibrations. Real and simulated data are used. The results show that ensembles trained on data with added noise can make PLSR robust against the type of noise added. In particular, the effects of sample temperature variations can be eliminated. Bagging does not seem to give any improvement over PLSR for small and intermediate numbers of components. It is, however, less sensitive to overfitting. Copyright (C) 2005 John Wiley & Sons, Ltd.
引用
收藏
页码:498 / 507
页数:10
相关论文
共 29 条
[1]   Direct orthogonalization [J].
Andersson, CA .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1999, 47 (01) :51-63
[2]  
Barutçuoglu Z, 2003, LECT NOTES COMPUT SC, V2714, P76
[3]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[4]   Improving nonparametric regression methods by bagging and boosting [J].
Borra, S ;
Di Ciaccio, A .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :407-420
[5]  
Breiman L, 1996, MACH LEARN, V24, P49
[6]  
Breiman L, 1996, ANN STAT, V24, P2350
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
Breiman L, 1998, ANN STAT, V26, P801
[10]   Using iterated bagging to debias regressions [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (03) :261-277