Outlier Removal with Weight Penalization and Aggregation: A Robust Variable Selection Method for Enhancing Near-Infrared Spectral Analysis Performance

被引：1

作者：

Li, Beibei ^{[1
]}

Li, Wenting ^{[1
]}

Guo, Junwei ^{[1
]}

Wang, Hongbo ^{[1
]}

Wan, Ran ^{[1
]}

Liu, Yu ^{[1
]}

Fan, Meijuan ^{[1
]}

Wang, Cong ^{[1
]}

Yang, Song ^{[1
]}

Zhao, Le ^{[1
]}

Nie, Cong ^{[1
]}

机构：

[1] CNTC, Zhengzhou Tobacco Res Inst, Lab Tobacco Chem, Zhengzhou 450001, Peoples R China

来源：

ANALYTICAL CHEMISTRY | 2025年 / 97卷 / 13期

关键词：

MULTIVARIATE CALIBRATION; SUBSET-SELECTION; PLS-REGRESSION; SPECTROSCOPY; ELIMINATION; ALGORITHMS; PREDICTION; STRATEGY; TOOL;

D O I：

10.1021/acs.analchem.4c07007

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Full-wavelength near-infrared (NIR) spectroscopy faces significant challenges due to the strong collinearity among spectral variables and the presence of variables that are highly sensitive to sample fluctuations. Additionally, not all spectral variables contribute equally to the NIR model. Weakly influential variables, although not important on their own, can provide substantial improvement when combined with stronger variables, thus increasing both model stability and prediction accuracy. Therefore, this study proposes a new variable selection method called outlier removal with weight penalization and aggregation (OR-WPA). The method begins by removing outlier spectral variables with high coefficient of variation, which enhances model stability. During the variable selection process, multiple submodels are constructed based on variable subsets, with variable weights assigned according to the absolute values of regression coefficients. A moving window is applied to average the weights, and variables with excessively high weights are penalized, promoting the selection of weakly influential variables that positively contribute to model accuracy. The variable space is iteratively reduced, and the subset of variables associated with the highest predictive accuracy is selected as the final characteristic variable combination. The OR-WPA method was evaluated on three NIR spectral data sets, involving corn, heated tobacco substrate, and flue-cured tobacco. The results were compared with three advanced variable selection methods: Monte Carlo uninformative variable elimination, competitive adaptive reweighted sampling, and bootstrapping soft shrinkage. The results indicate that OR-WPA demonstrates better predictive performance, particularly in predicting low-content components, where it significantly enhances both the accuracy and stability of the NIR model.

引用

页码：7325 / 7332

页数：8

共 41 条

[1] Principal component analysis [J].

Abdi, Herve ;

Williams, Lynne J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459

[2]

[Anonymous], 1998, StataTech. Bull

[3] Comparative study on the real-time monitoring of a fluid bed drying process of extruded granules using near-infrared spectroscopy and audible acoustic emission [J].

Aoki, Hisayoshi ;

Hattori, Yusuke ;

Sasaki, Tetsuo ;

Otsuka, Makoto .

INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2022, 619

[4]

Barbe P., 2012, WEIGHTEDBOOTSTRAP

[5] Quantitative NIR determination of isoflavone and saponin content of ground soybeans [J].

Berhow, Mark A. ;

Singh, Mukti ;

Bowman, Michael J. ;

Price, Neil P. J. ;

Vaughn, Steven F. ;

Liu, Sean X. .

FOOD CHEMISTRY, 2020, 317

[6] NTR calibration in non-linear systems:: different PLS approaches and artificial neural networks [J].

Blanco, M ;

Coello, J ;

Iturriaga, H ;

Maspoch, S ;

Pagès, J .

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2000, 50 (01) :75-82

[7] A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra [J].

Cai, Wensheng ;

Li, Yankun ;

Shao, Xueguang .

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) :188-194

[8] Theory and application of near infrared reflectance spectroscopy in determination of food quality [J].

Cen, Haiyan ;

He, Yong .

TRENDS IN FOOD SCIENCE & TECHNOLOGY, 2007, 18 (02) :72-83

[9] Elimination of uninformative variables for multivariate calibration [J].

Centner, V ;

Massart, DL ;

deNoord, OE ;

deJong, S ;

Vandeginste, BM ;

Sterna, C .

ANALYTICAL CHEMISTRY, 1996, 68 (21) :3851-3858

[10] Prediction of the level of astringency in persimmon using visible and near-infrared spectroscopy [J].

Cortes, Victoria ;

Rodriguez, Alejandro ;

Blasco, Jose ;

Rey, Beatriz ;

Besada, Cristina ;

Cubero, Sergio ;

Salvador, Alejandra ;

Talens, Pau ;

Aleixos, Nuria .

JOURNAL OF FOOD ENGINEERING, 2017, 204 :27-37

← 1 2 3 4 5 →