A new strategy of least absolute shrinkage and selection operator coupled with sampling error profile analysis for wavelength selection

被引:34
作者
Zhang, Ruoqiu [1 ]
Zhang, Feiyu [1 ]
Chen, Wanchao [1 ]
Yao, Heming [2 ]
Ge, Jiong [2 ]
Wu, Shengchao [2 ]
Wu, Ting [1 ]
Du, Yiping [1 ]
机构
[1] East China Univ Sci & Technol, Sch Chem & Mol Engn, Shanghai Key Lab Funct Mat Chem, Shanghai 200237, Peoples R China
[2] Shanghai Tobacco Grp Co Ltd, Shanghai 200082, Peoples R China
关键词
Wavelength selection; Vote rule; Least absolute shrinkage and selection operator; Sampling error profile analysis; NEAR-INFRARED SPECTROSCOPY; MODEL POPULATION ANALYSIS; UNINFORMATIVE VARIABLE ELIMINATION; SPECTRAL MULTIVARIATE CALIBRATION; TIKHONOV REGULARIZATION; PLS-REGRESSION; OUTLIER DETECTION; ANGLE REGRESSION; NIR SPECTRA; SQUARES;
D O I
10.1016/j.chemolab.2018.02.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new strategy based on sampling error profile analysis (SEPA) combined with least absolute shrinkage and selection operator (SEPA-LASSO) was proposed. LASSO has been proven to be effective for multivariate calibration with automatic variable selection for high-dimensional data. However, in the previous research, the critical process of multivariate calibration by LASSO was an optimization of 1-norm turning parameter for a fixed sample set without considering the behaviors of variable selection by different subsets of samples. In the present work, Monte Carlo Sampling (MCS), the core of SEPA framework, is used to investigate various sub-models. Least angle regression (LAR) is used to solve LASSO, and various LAR iteration including certain number of variables could be obtained instead of choosing the numerical values of 1 norm turning parameters. SEPA-LASSO algorithm consists of plenty of loops. Under the SEPA framework and LAR algorithm, a number of LASSO sub-models with the same dimensions are built by MCS in each loop, the vote rule is used to determine the importance of variables and select them to build variable subsets. After running the loops, several subsets of variables are obtained and their error profile is used to choose the optimal subset of variables. The performance of SEPA-LASSO was evaluated by three near-infrared (NIR) datasets. The results show that the model built by SEPA-LASSO has excellent predictability and interpretability, compared with some commonly used multivariate calibration methods, such as principal component regression (PCR) and partial least squares (PLS), as well as some wavelength selection methods including LASSO, moving window partial least squares regression (MWPLSR), Monte Carlo uninformative variable elimination (MC-UVE), ordered homogeneity pursuit lasso (OHPL) and stability competitive adaptive reweighted sampling (SCARS).
引用
收藏
页码:47 / 54
页数:8
相关论文
共 53 条
[1]   Detecting influential observations by cluster analysis and Monte Carlo cross-validation [J].
Bian, Xihui ;
Cai, Wensheng ;
Shao, Xueguang ;
Chen, Da ;
Grant, Edward R. .
ANALYST, 2010, 135 (11) :2841-2847
[2]   A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra [J].
Cai, Wensheng ;
Li, Yankun ;
Shao, Xueguang .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) :188-194
[3]   A New Strategy of Outlier Detection for QSAR/QSPR [J].
Cao, Dong-Sheng ;
Liang, Yi-Zeng ;
Xu, Qing-Song ;
Li, Hong-Dong ;
Chen, Xian .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (03) :592-602
[4]   Elimination of uninformative variables for multivariate calibration [J].
Centner, V ;
Massart, DL ;
deNoord, OE ;
deJong, S ;
Vandeginste, BM ;
Sterna, C .
ANALYTICAL CHEMISTRY, 1996, 68 (21) :3851-3858
[5]   Sampling error profile analysis (SEPA) for model optimization and model evaluation in multivariate calibration [J].
Chen, Wanchao ;
Du, Yiping ;
Zhang, Feiyu ;
Zhang, Ruoqiu ;
Ding, Boyang ;
Chen, Zengkai ;
Xiong, Qin .
JOURNAL OF CHEMOMETRICS, 2018, 32 (11)
[6]   A bootstrapping soft shrinkage approach for variable selection in chemical modeling [J].
Deng, Bai-Chuan ;
Yun, Yong-Huan ;
Cao, Dong-Sheng ;
Yin, Yu-Long ;
Wang, Wei-Ting ;
Lu, Hong-Mei ;
Luo, Qian-Yi ;
Liang, Yi-Zeng .
ANALYTICA CHIMICA ACTA, 2016, 908 :63-74
[7]   Model population analysis in chemometrics [J].
Deng, Bai-Chuan ;
Yun, Yong-Huan ;
Liang, Yi-Zeng .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 149 :166-176
[8]   A new strategy to prevent over-fitting in partial least squares models based on model population analysis [J].
Deng, Bai-Chuan ;
Yun, Yong-Huan ;
Liang, Yi-Zeng ;
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Yi, Lun-Zhao ;
Huang, Xin .
ANALYTICA CHIMICA ACTA, 2015, 880 :32-41
[9]   Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares [J].
Du, YP ;
Liang, YZ ;
Jiang, JH ;
Berry, RJ ;
Ozaki, Y .
ANALYTICA CHIMICA ACTA, 2004, 501 (02) :183-191
[10]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499