Research on the Twin Check Abnormal Sample Detection Method of Mid-Infrared Spectroscopy

被引:0
|
作者
Zhangzhu Shan-ying [1 ,2 ,3 ]
Zhang Ruo-jing [1 ,2 ,3 ]
Gu, Han-wen [5 ]
Xie Qin-lan [1 ,2 ,3 ]
Zhang Xian-wen [4 ]
Sa Ji-ming [5 ]
Liu Yi [3 ,6 ]
机构
[1] South Cent Minzu Univ, Coll Biomed Engn, Wuhan 430074, Peoples R China
[2] State Ethn Affairs Commiss, Key Lab Cognit Sci, Wuhan 430074, Peoples R China
[3] Hubei Key Lab Med Informat Anal & Tumor Diag & Tr, Wuhan 430074, Peoples R China
[4] Linyi Grepo Garden Machinery Co Ltd, Linyi 276700, Shandong, Peoples R China
[5] Wuhan Univ Technol, Sch Informat Engn, Wuhan 430070, Peoples R China
[6] Wuhan Univ Technol, Sch Mech & Elect Engn, Wuhan 430070, Peoples R China
关键词
Infrared spectroscopy; Abnormal sample; Twin check; Adaptive threshold;
D O I
10.3964/j.issn.1000-0593(2024)06-1546-07
中图分类号
O433 [光谱学];
学科分类号
0703 ; 070302 ;
摘要
Mid-infrared absorption spectroscopy is one of the most promising non-invasive blood glucose measurement techniques. The accuracy of blood glucose concentration measurement results of the mid-infrared absorption spectrum is closely related to the reliability of spectral signals. However, collecting mid-infrared spectral signals is susceptible to environmental or human factors, and an anomaly spectrum containing a large amount of interference information will be generated. The existence of an anomaly spectrum will reduce the effectiveness and reliability of the prediction model, so the detection and removal of abnormal samples are crucial. This study proposes that the twin check abnormal sample detection method can accurately screen and eliminate abnormal samples. This algorithm is divided into two stages. Firstly, the Monte Carlo cross-validation abnormal sample detection method is used to preliminarily screen abnormal samples and improve the stability of the spectral sample set. Secondly, based on the theory that Mahalanobis distance square approximately obeys chi-square distribution, the optimal threshold is adaptively determined, and the remaining data sets are re-identified with abnormal samples. 64 samples of the glucose-mixed imitated solution containing glucose, albumin, urea, lactic acid, fructose and cholesterol were studied. The twin check method first uses the characteristic that the sum of squared prediction errors is sensitive to abnormal samples to make a preliminary judgment on the abnormal samples in the spectral data set, and a total of 3 abnormal samples are detected. The PLS correction model is established after removing the abnormal samples from the spectral data set. The correlation coefficient of this model is 0.91, and RMSECV is 60.17 mg.dL(-1). Secondly, the twin check method is based on the theory of Mahalanobis distance square approximately conforming to chi-square distribution, which realizes the adaptive identification of abnormal samples. A total of 12 abnormal samples were detected. The performance of the PLS model constructed after removing all abnormal samples was improved, with the correlation coefficient reaching 0.99 and RMSECV reaching 57.77 mg.dL(-1). By comparing the results of the twin check method with the non-abnormal sample removal, PCA-MD method and Monte Carlo method, the superiority of this algorithm in abnormal sample detection is proved. Compared with the PLS model without removing abnormal samples, the correlation coefficient increased from 0.86 to 0.99, and RMSECV decreased from 67.51 to 57.77 mg.dL(-1), increasing by 15.12% and 14.42%, respectively. This study provides a good solution strategy for the problem of false detection of normal samples or missing detection of abnormal samples due to the easy influence of threshold of existing abnormal sample detection methods, which is conducive to the method's accurate detection and elimination of abnormal samples, thus improving the accuracy and prediction performance of the prediction model. This method provides a way to eliminate the abnormal samples of mid-infrared absorption spectrum accurately.
引用
收藏
页码:1546 / 1552
页数:7
相关论文
共 14 条
  • [1] Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
    Baba, Ali Mohammed
    Midi, Habshah
    Adam, Mohd Bakri
    Rahman, Nur Haizum Abd
    [J]. SYMMETRY-BASEL, 2021, 13 (11):
  • [2] [陈博文 Chen Bowen], 2022, [森林与环境学报, Journal of Forest and Environment], V42, P88
  • [3] Chen Ting Chen Ting, 2015, Journal of Food Safety and Quality, V6, P836
  • [4] Etherington Thomas R, 2019, Peer J, V7, pe6678
  • [5] Investigation of the effect of clinically relevant interferents on glucose monitoring using near-infrared spectroscopy
    Fuglerud, Silje Skeide
    Ellingsen, Reinold
    Aksnes, Astrid
    Hjelme, Dag Roar
    [J]. JOURNAL OF BIOPHOTONICS, 2021, 14 (05)
  • [6] Subsampling bias and the best-discrepancy systematic cross validation
    Guo, Liang
    Liu, Jianya
    Lu, Ruodan
    [J]. SCIENCE CHINA-MATHEMATICS, 2021, 64 (01) : 197 - 210
  • [7] Noninvasive Glucose Measurement Using Machine Learning and Neural Network Methods and Correlation with Heart Rate Variability
    Gusev, Marjan
    Poposka, Lidija
    Spasevski, Gjoko
    Kostoska, Magdalena
    Koteska, Bojana
    Simjanoska, Monika
    Ackovska, Nevena
    Stojmenski, Aleksandar
    Tasic, Jurij
    Trontelj, Janez
    [J]. JOURNAL OF SENSORS, 2020, 2020
  • [8] 基于CARS变量选择方法的小麦硬度测定研究
    姜明伟
    王彩红
    张庆辉
    [J]. 河南工业大学学报(自然科学版), 2020, 41 (06) : 91 - 95+105
  • [9] LIAO Wen-hui, 2021, Journal of Applied Statistics and Management, V40, P822
  • [10] Accurate prediction of glucose concentration and identification of major contributing features from hardly distinguishable near-infrared spectroscopy
    Mekonnen, Bitewulign Kassa
    Yang, Webb
    Hsieh, Tung-Han
    Liaw, Shien-Kuei
    Yang, Fu-Liang
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59 (59)