Outlier Removal with Weight Penalization and Aggregation: A Robust Variable Selection Method for Enhancing Near-Infrared Spectral Analysis Performance

被引:1
作者
Li, Beibei [1 ]
Li, Wenting [1 ]
Guo, Junwei [1 ]
Wang, Hongbo [1 ]
Wan, Ran [1 ]
Liu, Yu [1 ]
Fan, Meijuan [1 ]
Wang, Cong [1 ]
Yang, Song [1 ]
Zhao, Le [1 ]
Nie, Cong [1 ]
机构
[1] CNTC, Zhengzhou Tobacco Res Inst, Lab Tobacco Chem, Zhengzhou 450001, Peoples R China
关键词
MULTIVARIATE CALIBRATION; SUBSET-SELECTION; PLS-REGRESSION; SPECTROSCOPY; ELIMINATION; ALGORITHMS; PREDICTION; STRATEGY; TOOL;
D O I
10.1021/acs.analchem.4c07007
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Full-wavelength near-infrared (NIR) spectroscopy faces significant challenges due to the strong collinearity among spectral variables and the presence of variables that are highly sensitive to sample fluctuations. Additionally, not all spectral variables contribute equally to the NIR model. Weakly influential variables, although not important on their own, can provide substantial improvement when combined with stronger variables, thus increasing both model stability and prediction accuracy. Therefore, this study proposes a new variable selection method called outlier removal with weight penalization and aggregation (OR-WPA). The method begins by removing outlier spectral variables with high coefficient of variation, which enhances model stability. During the variable selection process, multiple submodels are constructed based on variable subsets, with variable weights assigned according to the absolute values of regression coefficients. A moving window is applied to average the weights, and variables with excessively high weights are penalized, promoting the selection of weakly influential variables that positively contribute to model accuracy. The variable space is iteratively reduced, and the subset of variables associated with the highest predictive accuracy is selected as the final characteristic variable combination. The OR-WPA method was evaluated on three NIR spectral data sets, involving corn, heated tobacco substrate, and flue-cured tobacco. The results were compared with three advanced variable selection methods: Monte Carlo uninformative variable elimination, competitive adaptive reweighted sampling, and bootstrapping soft shrinkage. The results indicate that OR-WPA demonstrates better predictive performance, particularly in predicting low-content components, where it significantly enhances both the accuracy and stability of the NIR model.
引用
收藏
页码:7325 / 7332
页数:8
相关论文
共 41 条
[21]   COMPUTER AIDED DESIGN OF EXPERIMENTS [J].
KENNARD, RW ;
STONE, LA .
TECHNOMETRICS, 1969, 11 (01) :137-&
[22]   Genetic algorithms applied to feature selection in PLS regression: how and when to use them [J].
Leardi, R ;
Gonzalez, AL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 41 (02) :195-207
[23]   Model-population analysis and its applications in chemical and biological modeling [J].
Li, Hong-Dong ;
Liang, Yi-Zeng ;
Xu, Qing-Song ;
Cao, Dong-Sheng .
TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2012, 38 :154-162
[24]   Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration [J].
Li, Hongdong ;
Liang, Yizeng ;
Xu, Qingsong ;
Cao, Dongsheng .
ANALYTICA CHIMICA ACTA, 2009, 648 (01) :77-84
[25]   Real-time grading of roasted tobacco using near infrared spectroscopy technology [J].
Liu, Hubin ;
Tian, Luanluan ;
Wang, Luoping ;
Zhang, Zhixiang ;
Li, Jiachen ;
Liu, Xinruitong ;
Zheng, Bowen ;
Ma, Hongfeng ;
Wang, Yaling ;
Li, Junhui .
MICROCHEMICAL JOURNAL, 2024, 204
[26]   Application of multiple classifier fusion in the discriminant analysis of near infrared spectroscopy for agricultural products [J].
Luan, Lili ;
Wang, Yuheng ;
Li, Xueying ;
Hu, Wenyan ;
Li, Kai ;
Li, Junhui ;
Yang, Kai ;
Shu, Ruxin ;
Zhao, Longlian ;
Lao, Cailian .
JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2016, 24 (04) :363-372
[27]   Quality analysis and authentication of nutraceuticals using near IR (NIR) spectroscopy: A comprehensive review of novel trends and applications [J].
Nagy, Maii M. ;
Wang, Shengpeng ;
Farag, Mohamed A. .
TRENDS IN FOOD SCIENCE & TECHNOLOGY, 2022, 123 :290-309
[28]   A portable IoT NIR spectroscopic system to analyze the quality of dairy farm forage [J].
Rego, Guillermo ;
Ferrero, Francisco ;
Valledor, Marta ;
Carlos Campo, Juan ;
Forcada, Sergio ;
Royo, Luis J. ;
Soldado, Ana .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 175
[29]   Multivariate calibration of near-infrared spectra by using influential variables [J].
Shao, Xueguang ;
Zhang, Min ;
Cai, Wensheng .
ANALYTICAL METHODS, 2012, 4 (02) :467-473
[30]   Optimized partition of minimum spanning tree for piecewise modeling by particle swarm algorithm. QSAR studies of antagonism of angiotensin II antagonists [J].
Shen, Q ;
Jiang, JH ;
Jiao, CX ;
Huan, SY ;
Shen, GL ;
Yu, RQ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (06) :2027-2031