A two-dimensional sample screening method based on data quality and variable correlation

被引:6
作者
Li, Gang [1 ,2 ]
Wang, Dan [1 ,2 ]
Wang, Kang [1 ,2 ]
Lin, Ling [1 ,2 ]
机构
[1] Tianjin Univ, State Key Lab Precis Measurement Technol & Instru, Tianjin 300072, Peoples R China
[2] Tianjin Univ, China & Tianjin Key Lab Biomed Detecting Tech & I, Tianjin 300072, Peoples R China
关键词
Sample screening; Training set; Dynamic spectrum; PLS; Spectral analysis; Mahalanobis distance; NEAR-INFRARED SPECTROSCOPY; NEURAL-NETWORK; SELECTION; CALIBRATION; SUBSET; CLASSIFICATION; ALGORITHM;
D O I
10.1016/j.aca.2022.339700
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The selection of a training set is the key to determining the quality of the model. In the spectrum analysis, due to various interference factors, the quality of the collected spectral data of some samples has a serious deviation. If directly used in modeling, it will introduce bias to the establishment of the model. Therefore, to get the most representative samples, it is necessary to select samples before establishing the model. This paper proposes a two-dimensional sample selection (TDSS) method, which selects samples from two angles of spectral data quality and variable correlation. This method and Mahalanobis distance method were respectively applied to dynamic spectrum (DS) data to screen samples. The samples screened by the two methods were used for modeling. Finally, establish partial least squares (PLS) linear regression model with a quadratic nonlinear correction method to predict the target components. The experimental results show that the sample screening method significantly improved the accuracy and prediction performance of the model, and it is better than the Mahalanobis distance method. In the prediction of triglyceride and total cholesterol, the correlation coefficient can reach above 0.82. The experimental results fully prove the effectiveness of the sample selection method in this paper, and it has a remarkable effect on improving the accuracy and robustness of the model. This paper provides a new way for sample selection of modeling set in spectral analysis of complex solutions. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 31 条
  • [1] [Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI DOI 10.1145/1961189.1961199
  • [2] The effect of spectral photoplethysmography amplification and its application in dynamic spectrum for effective noninvasive detection of blood components
    Awelisah, Yussif Moro
    Li, Gang
    Ijaz, Muhammad
    Lin, Ling
    [J]. OPTICS AND LASER TECHNOLOGY, 2021, 133
  • [3] Sample selection bias and Heckman models in strategic management research
    Certo, S. Trevis
    Busenbark, John R.
    Woo, Hyun-Soo
    Semadeni, Matthew
    [J]. STRATEGIC MANAGEMENT JOURNAL, 2016, 37 (13) : 2639 - 2657
  • [4] A simulation study on classic and robust variable selection in linear regression
    Çetin, Meral
    Erar, Aydin
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2006, 175 (02) : 1629 - 1643
  • [5] An adaptive strategy for selecting representative calibration samples in the continuous wavelet domain for near-infrared spectral analysis
    Chen, Da
    Cai, Wensheng
    Shao, Xueguang
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2007, 387 (03) : 1041 - 1048
  • [6] Classification of vinegar samples based on near infrared spectroscopy combined with wavelength selection
    Fan, Wei
    Li, Hongdong
    Shan, Yang
    Lv, Huiying
    Zhang, Huaxiu
    Liang, Yizeng
    [J]. ANALYTICAL METHODS, 2011, 3 (08) : 1872 - 1876
  • [7] Wavelength selection for portable noninvasive blood component measurement system based on spectral difference coefficient and dynamic spectrum
    Feng, Ximeng
    Li, Gang
    Yu, Haixia
    Wang, Shaohui
    Yi, Xiaoqing
    Lin, Ling
    [J]. SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2018, 193 : 40 - 46
  • [8] A method for calibration and validation subset partitioning
    Galvao, RKH
    Araujo, MCU
    José, GE
    Pontes, MJC
    Silva, EC
    Saldanha, TCB
    [J]. TALANTA, 2005, 67 (04) : 736 - 740
  • [9] Spectral data quality assessment based on variability analysis: application to noninvasive hemoglobin measurement by dynamic spectrum
    He, Wenqin
    Li, Xiaoxia
    Wang, Mengjun
    Li, Gang
    Lin, Ling
    [J]. ANALYTICAL METHODS, 2015, 7 (13) : 5565 - 5573
  • [10] Selection of a calibration sample subset by a semi-supervised method
    He, Zhonghai
    Ma, Zhenhe
    Li, Mengchao
    Zhou, Yang
    [J]. JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2018, 26 (02) : 87 - 94