A two-dimensional sample screening method based on data quality and variable correlation

被引:6
作者
Li, Gang [1 ,2 ]
Wang, Dan [1 ,2 ]
Wang, Kang [1 ,2 ]
Lin, Ling [1 ,2 ]
机构
[1] Tianjin Univ, State Key Lab Precis Measurement Technol & Instru, Tianjin 300072, Peoples R China
[2] Tianjin Univ, China & Tianjin Key Lab Biomed Detecting Tech & I, Tianjin 300072, Peoples R China
关键词
Sample screening; Training set; Dynamic spectrum; PLS; Spectral analysis; Mahalanobis distance; NEAR-INFRARED SPECTROSCOPY; NEURAL-NETWORK; SELECTION; CALIBRATION; SUBSET; CLASSIFICATION; ALGORITHM;
D O I
10.1016/j.aca.2022.339700
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The selection of a training set is the key to determining the quality of the model. In the spectrum analysis, due to various interference factors, the quality of the collected spectral data of some samples has a serious deviation. If directly used in modeling, it will introduce bias to the establishment of the model. Therefore, to get the most representative samples, it is necessary to select samples before establishing the model. This paper proposes a two-dimensional sample selection (TDSS) method, which selects samples from two angles of spectral data quality and variable correlation. This method and Mahalanobis distance method were respectively applied to dynamic spectrum (DS) data to screen samples. The samples screened by the two methods were used for modeling. Finally, establish partial least squares (PLS) linear regression model with a quadratic nonlinear correction method to predict the target components. The experimental results show that the sample screening method significantly improved the accuracy and prediction performance of the model, and it is better than the Mahalanobis distance method. In the prediction of triglyceride and total cholesterol, the correlation coefficient can reach above 0.82. The experimental results fully prove the effectiveness of the sample selection method in this paper, and it has a remarkable effect on improving the accuracy and robustness of the model. This paper provides a new way for sample selection of modeling set in spectral analysis of complex solutions. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 31 条
  • [21] A sample selection method specific to unknown test samples for calibration and validation sets based on spectra similarity
    Sun, Yue
    Yuan, Meng
    Liu, Xiaoyan
    Su, Mei
    Wang, Linlin
    Zeng, Yingzi
    Zang, Hengchang
    Nie, Lei
    [J]. SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2021, 258
  • [22] Dynamic spectrum nonlinear modeling of VIS & NIR band based on RBF neural network for noninvasive blood component analysis to consider the effects of scattering
    Tang, Wei
    Yan, Wenjuan
    He, Guoquan
    Li, Gang
    Lin, Ling
    [J]. INFRARED PHYSICS & TECHNOLOGY, 2019, 96 : 77 - 83
  • [23] Weighted SPXY method for calibration set selection for composition analysis based on near-infrared spectroscopy
    Tian, Han
    Zhang, Linna
    Li, Ming
    Wang, Yue
    Sheng, Dinggao
    Liu, Jun
    Wang, Chengmin
    [J]. INFRARED PHYSICS & TECHNOLOGY, 2018, 95 : 88 - 92
  • [24] Tong SR, 2015, OXID COMMUN, V38, P1076
  • [25] A review on M plus N theory and its strategies to improve the accuracy of spectrochemical composition analysis of complex liquids
    Wan, Xinghua
    Li, Gang
    Li, Ting
    Yan, Wenjuan
    He, Guoquan
    Lin, Ling
    [J]. APPLIED SPECTROSCOPY REVIEWS, 2020, 55 (02) : 87 - 104
  • [26] Maximum Ambiguity-Based Sample Selection in Fuzzy Decision Tree Induction
    Wang, Xi-Zhao
    Dong, Ling-Cai
    Yan, Jian-Hui
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (08) : 1491 - 1505
  • [27] An Variable Selection Method of the Significance Multivariate Correlation Competitive Population Analysis for Near-Infrared Spectroscopy in Chemical Modeling
    Wang, Yuxi
    Jia, Zhenhong
    Yang, Jie
    [J]. IEEE ACCESS, 2019, 7 : 167195 - 167209
  • [28] Dynamic Spectrum for noninvasive blood component analysis and its advances
    Wang, Yuyu
    Li, Gang
    Wang, Huiquan
    Zhou, Mei
    Lin, Ling
    [J]. APPLIED SPECTROSCOPY REVIEWS, 2019, 54 (09) : 736 - 757
  • [29] Fast determination of oxide content in cement raw meal using NIR spectroscopy with the SPXY algorithm
    Yang, Zhenfa
    Xiao, Hang
    Zhang, Lei
    Feng, Dejun
    Zhang, Faye
    Jiang, Mingshun
    Sui, Qingmei
    Jia, Lei
    [J]. ANALYTICAL METHODS, 2019, 11 (31) : 3936 - 3942
  • [30] Determination of Hesperidin in Tangerine Leaf by Near-Infrared Spectroscopy with SPXY Algorithm for Sample Subset Partitioning and Monte Carlo Cross Validation
    Zhan Xiao-ri
    Zhu Xiang-rong
    Shi Xin-yuan
    Zhang Zhuo-yong
    Qiao Yan-jiang
    [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2009, 29 (04) : 964 - 968