A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations

被引:83
作者
Baumann, K [1 ]
Albert, H [1 ]
von Korff, M [1 ]
机构
[1] Univ Wurzburg, Dept Pharm, D-97074 Wurzburg, Germany
关键词
cross-validation; variable selection; PLS; PCR; tabu search;
D O I
10.1002/cem.730
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variable selection is an extensively studied problem in chemometrics and in the area of quantitative structure-activity relationships (QSARs). Many search algorithms have been compared so far. Less well studied is the influence-of different objective functions on the prediction quality of the selected models. This paper investigates the performance of different cross-validation techniques as objective function for variable selection in latent variable regression. The results are compared in terms of predictive ability, model size (number of variables) and model complexity (number of latent variables). It will be shown that leave-multiple-out cross-validation with a large percentage of data left out performs best. Since leave-multiple-out cross-validation is computationally expensive, a very efficient tabu search algorithm is introduced to lower the computational burden. The tabu search algorithm needs no user-defined operational parameters and optimizes the variable subset and the number of latent variables simultaneously. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:339 / 350
页数:12
相关论文
共 68 条
[1]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[2]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[3]  
[Anonymous], 1990, SUBSET SELECTION REG, DOI DOI 10.1007/978-1-4899-2939-6
[4]   Examination of criteria for local model principal component regression [J].
Bakken, GA ;
Long, DR ;
Kalivas, JH .
APPLIED SPECTROSCOPY, 1997, 51 (12) :1814-1822
[5]   Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: Application to near-infrared spectroscopy [J].
Bangalore, AS ;
Shaffer, RE ;
Small, GW ;
Arnold, MA .
ANALYTICAL CHEMISTRY, 1996, 68 (23) :4200-4212
[6]   PREDICTIVE ABILITY OF REGRESSION-MODELS .2. SELECTION OF THE BEST PREDICTIVE PLS MODEL [J].
BARONI, M ;
CLEMENTI, S ;
CRUCIANI, G ;
COSTANTINO, G ;
RIGANELLI, D ;
OBERRAUCH, E .
JOURNAL OF CHEMOMETRICS, 1992, 6 (06) :347-356
[7]   GENERATING OPTIMAL LINEAR PLS ESTIMATIONS (GOLPE) - AN ADVANCED CHEMOMETRIC TOOL FOR HANDLING 3D-QSAR PROBLEMS [J].
BARONI, M ;
COSTANTINO, G ;
CRUCIANI, G ;
RIGANELLI, D ;
VALIGI, R ;
CLEMENTI, S .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (01) :9-20
[9]   SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE [J].
BREIMAN, L ;
SPECTOR, P .
INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) :291-319
[10]  
BREIMAN L, 1984, CLASSIFICATION REGRE, P306