The successive projections algorithm for spectral variable selection in classification problems

被引:159
作者
Pontes, MJC [1 ]
Galvao, RKH [1 ]
Araújo, MCU [1 ]
Nogueira, P [1 ]
Moreira, T [1 ]
Neto, ODP [1 ]
José, GE [1 ]
Saldanha, TCB [1 ]
机构
[1] Univ Fed Paraiba, Dept Quim, CCEN, BR-58051970 Joao Pessoa, Paraiba, Brazil
关键词
successive projections algorithm; classification; linear discriminant analysis; genetic algorithm; SIMCA; UVVIS spectrometry; NIR; spectrometry; vegetable oils; diesel;
D O I
10.1016/j.chemolab.2004.12.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Successive Projections Algorithm (SPA) has been shown to be a useful tool for variable selection in the framework of multivariate calibration. In this paper, the collinearity minimization role of SPA is exploited in the context of classification methods for which collinearity is a known cause of generalization problems. For this purpose, a cost function associated to the average risk of misclassification by Linear Discriminant Analysis (LDA) is used to guide SPA selection. The proposed approach is illustrated in two classification problems. The first problem involves four types of vegetable oils (corn, soya, canola, sunflower). In this case, UV-VIS spectrometry is adopted to emphasize the ability of SPA-LDA to deal with low-resolution spectra with strong overlapping, which are associated to the wide absorption bands in this region. In the second problem, NIR spectrometry is employed to discriminate diesel samples with respect to the concentration level of sulphur. This application illustrates the use of SPA-LDA in a large-scale variable selection scenario. In these two examples, SPA-LDA is compared with the commonly used SIMCA classification method, as well as with a genetic algorithm (GA). The results show that SPA-LDA is superior to SIMCA and comparable to GA-LDA with respect to classification accuracy in an independent prediction set. Moreover, SPALDA is found to be less sensitive to instrumental noise than GA-LDA. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:11 / 18
页数:8
相关论文
共 29 条
[1]  
[Anonymous], 1998, Chemometrics: A Practical Guide
[2]   The successive projections algorithm for variable selection in spectroscopic multicomponent analysis [J].
Araújo, MCU ;
Saldanha, TCB ;
Galvao, RKH ;
Yoneyama, T ;
Chame, HC ;
Visani, V .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 57 (02) :65-73
[3]  
*ASTM, D429490 ASTM
[4]   Detection of virgin olive oil adulteration by Fourier transform Raman spectroscopy [J].
Baeten, V ;
Meurens, M ;
Morales, MT ;
Aparicio, R .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 1996, 44 (08) :2225-2230
[5]   Standardization of near-infrared spectrometric instruments [J].
Bouveresse, E ;
Hartmann, C ;
Massart, DL ;
Last, IR ;
Prebble, KA .
ANALYTICAL CHEMISTRY, 1996, 68 (06) :982-990
[6]   Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration [J].
Breitkreitz, MC ;
Raimundo, IM ;
Rohwedder, JJR ;
Pasquini, C ;
Dantas, HA ;
José, GE ;
Araújo, MCU .
ANALYST, 2003, 128 (09) :1204-1207
[7]   Genetic algorithms combined with discriminant analysis for key variable identification [J].
Chiang, LH ;
Pell, RJ .
JOURNAL OF PROCESS CONTROL, 2004, 14 (02) :143-155
[8]   A solution to the wavelet transform optimization problem in multicomponent analysis [J].
Coelho, CJ ;
Galvao, RKH ;
Araujo, MCU ;
Pimentel, MF ;
da Silva, EC .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2003, 66 (02) :205-217
[9]   A linear semi-infinite programming strategy for constructing optimal wavelet transforms in multivariate calibration problems [J].
Coelho, CJ ;
Galvao, RKH ;
de Araújo, MCU ;
Pimentel, MF ;
da Silva, EC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (03) :928-933
[10]   Classification of vegetable oils by FT-IR [J].
Dahlberg, DB ;
Lee, SM ;
Wenger, SJ ;
Vargo, JA .
APPLIED SPECTROSCOPY, 1997, 51 (08) :1118-1124