An Variable Selection Method of the Significance Multivariate Correlation Competitive Population Analysis for Near-Infrared Spectroscopy in Chemical Modeling

被引:8
|
作者
Wang, Yuxi [1 ]
Jia, Zhenhong [1 ]
Yang, Jie [2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
基金
美国国家科学基金会;
关键词
Spectrochemical analysis; variable selection; the significant multivariate correlation; weighted bootstrap sampling; model population analysis; monte Carlo sampling; analytical techniques; partial least squares method; PARTIAL LEAST-SQUARES; REGRESSION; SHRINKAGE; CALIBRATION; PROJECTION; STRATEGY; SPACE; OPTIMIZATION; PERSPECTIVE; WAVELENGTHS;
D O I
10.1109/ACCESS.2019.2954115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The high dimensionality of spectral datasets makes it difficult to select the optimal subset of variables. This paper presents a new method for variable selection called the significant multivariate competitive population analysis (SMCPA), Which combines ideas of significant multivariate correlation (SMC) and model population analysis, and employs weighted bootstrap sampling (WBS) and exponential decline function (EDF) competition methods. In this study, the values of SMC distributions are used as an index for evaluating the importance of each wavelength. Then, based on the importance level of each wavelength. SMCPA sequentially selects N subsets of spectral wavelengths by N Monte Carlo sampling in an iterative and competitive procedure. In each sampling run, a fixed ratio of samples is used to build a calibrated partial least-squares model, and then SMC is performed to obtain the score and threshold values. Next, based on the significant multivariate correlation scores, the key variables are selected by two steps: the compulsory selection of exponential decline function and the competitive selection of adaptive weighted sampling. Finally, cross-validation(CV) is applied to select the optimal subset with the lowest root mean square error. This method is tested on three NIR spectral datasets and compared against three high-performance variable selection methods. The experimental results show that the proposed algorithm has the highest efficiency and the best selection effect, and can usually locate the optimal combination of key wavelength variables in a dataset. The evaluation result after PLS modeling is also the best.
引用
收藏
页码:167195 / 167209
页数:15
相关论文
共 50 条
  • [41] Application of Wavelet Component Selection and Orthogonal Signal Correction in the Multivariate Calibration by Near-Infrared Spectroscopy
    Peng, Dan
    Ji, Junmin
    Li, Xia
    Dong, Kaina
    ADVANCED RESEARCH ON COMPUTER SCIENCE AND INFORMATION ENGINEERING, PT I, 2011, 152 : 374 - 380
  • [42] A novel variable selection algorithm based on neural network for near-infrared spectral modeling
    Zhang, Pengfei
    Xu, Zhuopin
    Ma, Huimin
    Zheng, Lei
    Li, Xiaohong
    Zhang, Zhiyi
    Wu, Yuejin
    Wang, Qi
    ANALYTICA CHIMICA ACTA, 2024, 1330
  • [43] A Variable Selection Method Based on Ensemble-SISPLS for Near Infrared Spectroscopy
    Li Si-hai
    Zhao Lei
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39 (04) : 1047 - 1052
  • [44] Optical wavelength selection for portable hemoglobin determination by near-infrared spectroscopy method
    Tian, Han
    Li, Ming
    Wang, Yue
    Sheng, Dinggao
    Liu, Jun
    Zhang, Linna
    INFRARED PHYSICS & TECHNOLOGY, 2017, 86 : 98 - 102
  • [45] Fast quantitative detection of black pepper and cumin adulterations by near-infrared spectroscopy and multivariate modeling
    Sales de Lima, Amanda Beatriz
    Batista, Acsa Santos
    de Jesus, Josane Cardim
    Silva, Jaqueline de Jesus
    Mendes de Araujo, Antonia Cardoso
    Santos, Leandro Soares
    FOOD CONTROL, 2020, 107
  • [46] In-line monitoring of alcohol precipitation by near-infrared spectroscopy in conjunction with multivariate batch modeling
    Huang, Hongxia
    Qu, Haibin
    ANALYTICA CHIMICA ACTA, 2011, 707 (01) : 47 - 56
  • [47] Discretized butterfly optimization algorithm for variable selection in the rapid determination of cholesterol by near-infrared spectroscopy
    Bian, Xihui
    Zhao, Zizhen
    Liu, Jianwen
    Liu, Peng
    Shi, Huibing
    Tan, Xiaoyao
    ANALYTICAL METHODS, 2023, 15 (39) : 5190 - 5198
  • [48] Variable selection in near infrared spectroscopy based on significance testing in partial least squares regression
    Westad, F
    Martens, H
    JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2000, 8 (02) : 117 - 124
  • [49] Monitoring complex media fermentations with near-infrared spectroscopy: Comparison of different variable selection methods
    Ferreira, AP
    Alves, TP
    Menezes, JC
    BIOTECHNOLOGY AND BIOENGINEERING, 2005, 91 (04) : 474 - 481
  • [50] Feature Variable Selection for Near-Infrared Spectroscopy Based on Simulated Annealing Bee Colony Algorithm
    Shi, Jianfei
    Tong, Baihong
    Liu, Jinming
    Chen, Zhengguang
    Li, Pengfei
    Tan, Chong
    JOURNAL OF CHEMOMETRICS, 2025, 39 (01)