Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation

被引:455
作者
Farres, Mireia [1 ]
Platikanov, Stefan [1 ]
Tsakovski, Stefan [2 ]
Tauler, Roma [1 ]
机构
[1] CSIC, IDAEA, Dept Environm Chem, ES-08034 Barcelona, Spain
[2] Univ Sofia, Fac Chem, Dept Analyt Chem, Sofia 1164, Bulgaria
基金
欧洲研究理事会;
关键词
variable importance in projection; selectivity ratio; variable selection; partial least squares; PARTIAL LEAST-SQUARES; MASS-SPECTRAL PROFILES; MICROARRAY DATA; REGRESSION; CLASSIFICATION; IDENTIFICATION; PERFORMANCE; INDEX; WATER; PANEL;
D O I
10.1002/cem.2736
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography-mass spectrometry (GC-MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p-value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC-MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:528 / 536
页数:9
相关论文
共 41 条
[1]   Variable selection in regression-a tutorial [J].
Andersen, C. M. ;
Bro, R. .
JOURNAL OF CHEMOMETRICS, 2010, 24 (11-12) :728-737
[2]   Predictive-property-ranked variable reduction in partial least squares modelling with final complexity adapted models: Comparison of properties for ranking [J].
Andries, Jan P. M. ;
Vander Heyden, Yvan ;
Buydens, Lutgarde M. C. .
ANALYTICA CHIMICA ACTA, 2013, 760 :34-45
[3]   HYDROCARBONS OF MARINE PHYTOPLANKTON [J].
BLUMER, M ;
GUILLARD, RR ;
CHASE, T .
MARINE BIOLOGY, 1971, 8 (03) :183-&
[4]   Identification of Metabolic Pathways in Daphnia magna Explaining Hormetic Effects of Selective Serotonin Reuptake Inhibitors and 4-Nonylphenol Using Transcriptomic and Phenotypic Responses [J].
Campos, Bruno ;
Garcia-Reyero, Natalia ;
Rivetti, Claudia ;
Escalon, Lynn ;
Habib, Tanwir ;
Tauler, Roma ;
Tsakovski, Stefan ;
Pina, Benjamin ;
Barata, Carlos .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2013, 47 (16) :9434-9443
[5]   Performance of some variable selection methods when multicollinearity is present [J].
Chong, IG ;
Jun, CH .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) :103-112
[6]   Contribution of the FPA tasting panel to decision making about drinking water treatment facilities [J].
Devesa, R. ;
Cardenoso, R. ;
Matia, L. .
WATER SCIENCE AND TECHNOLOGY, 2007, 55 (05) :127-135
[7]   The panel of Aigues de Barcelona:: 15 years of history [J].
Devesa, R ;
Fabrellas, C ;
Cardeñoso, R ;
Matia, L ;
Ventura, F ;
Salvatella, N .
WATER SCIENCE AND TECHNOLOGY, 2004, 49 (09) :145-151
[8]  
Eilers P. H. C., 2005, BASELINE CORRECTION
[9]   A perfect smoother [J].
Eilers, PHC .
ANALYTICAL CHEMISTRY, 2003, 75 (14) :3631-3636
[10]   8th Scandinavian Symposium on Chemometrics, Mariehamn, Aland, Finland [J].
Eriksson, L ;
Gottfries, J ;
Lundstedt, T .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2004, 73 (01) :1-2