MultiThal-classifier, a machine learning-based multi-class model for thalassemia diagnosis and classification

被引:0
作者
Wang, Wenqiang [1 ]
Ye, Renqing [1 ]
Tang, Baojia [1 ]
Qi, Yuying [1 ]
机构
[1] Ningde Normal Univ, Dept Clin Lab, Ningde Municipal Hosp, 13 Mindong Rd East,Dongqiao Econ & Technol Dev Zon, Ningde 352100, Fujian, Peoples R China
关键词
Thalassemia; Iron Deficiency Anemia; Machine Learning; Multi-Class Model; Hematological Parameters; IRON-DEFICIENCY;
D O I
10.1016/j.cca.2024.120025
中图分类号
R446 [实验室诊断]; R-33 [实验医学、医学实验];
学科分类号
1001 ;
摘要
Background: The differential diagnosis between iron deficiency anemia (IDA) and thalassemia trait (TT) remains a significant clinical challenge. This study aimed to develop a machine learning-based multi-class model to differentiate among Microcytic-TT(TT with low mean corpuscular volume), Normocytic-TT (TT with normal mean corpuscular volume), IDA, and healthy individuals. Methods: A comprehensive dataset comprising 1,819 individuals was analyzed using six distinct machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) algorithm was ultimately selected to construct the MultiThal-Classifier (M-THAL) model. SMOTENC (Synthetic Minority Over-sampling Technique for Nominal and Continuous features) was employed to address data imbalance. Model performance was evaluated using various metrics, and SHAP values were applied to interpret the model's predictions.Additionally, external validation was conducted to assess the model's robustness and generalizability. Results: After performing 1000 bootstrap resamples of the test set, the average performance metrics of M-THAL and the 95 % confidence interval(CI) were as follows, sensitivity 90.27 % (95 % CI: 84.88-95.26), specificity 97.87 % (95% CI: 97.10-98.55), PPV 93.42 % (95 % CI: 89.34-96.48), NPV 97.82% (95 % CI: 97.00-98.53), F1score 91.50 % (95% CI: 87.29-95.34), Youden's index 88.15 % (95 % CI: 82.33-93.70), accuracy 97.06 % (95% CI: 96.06-97.99), and AUC 94.07 % (95 % CI: 91.17-96.84).Feature importance analysis identified mean corpuscular volume(MCV), mean corpuscular hemoglobin(MCH), red cell distribution width - standard deviation(RDW-SD), and hemoglobin (HGB) were identified as the most important features. External validation confirmed the model's robustness and generalizability. Conclusion: The M-THAL effectively distinguishes Normocytic-TT, Microcytic-TT, IDA, and healthy individuals using hematological parameters, offers a rapid and cost-effective screening tool that can be readily implemented in diverse healthcare settings.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms
    Phann, Raksmey
    Soomlek, Chitsutha
    Seresangtakul, Pusadee
    ACTA INFORMATICA PRAGENSIA, 2023, 12 (02) : 243 - 259
  • [32] A MULTI-CLASS SUPPORT VECTOR MACHINE: THEORY AND MODEL
    Sun, Minghe
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2013, 12 (06) : 1175 - 1199
  • [33] Multi-class classification model for psychiatric disorder discrimination
    Emre, Ilkim Ecem
    Erol, Cigdem
    Tas, Cumhur
    Tarhan, Nevzat
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 170
  • [34] Binary class and multi-class plant disease detection using ensemble deep learning-based approach
    Sunil, C. K.
    Jaidhar, C. D.
    Patil, Nagamma
    INTERNATIONAL JOURNAL OF SUSTAINABLE AGRICULTURAL MANAGEMENT AND INFORMATICS, 2022, 8 (04) : 385 - 407
  • [35] Learning Optimal Fair Scoring Systems for Multi-Class Classification
    Rouzot, Julien
    Ferry, Julien
    Huguet, Marie-Jose
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 197 - 204
  • [36] A machine learning model for multi-class classification of quenched and partitioned steel microstructure type by the k-nearest neighbor algorithm
    Gupta, Ashutosh Kumar
    Chakroborty, Sunny
    Ghosh, Swarup Kumar
    Ganguly, Subhas
    COMPUTATIONAL MATERIALS SCIENCE, 2023, 228
  • [37] Learning label-specific features for decomposition-based multi-class classification
    Jia, Bin-Bin
    Liu, Jun-Ying
    Hang, Jun-Yi
    Zhang, Min-Ling
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (06)
  • [38] EEG Authentication System Based on One- and Multi-Class Machine Learning Classifiers
    Hernandez-Alvarez, Luis
    Barbierato, Elena
    Caputo, Stefano
    Mucchi, Lorenzo
    Hernandez Encinas, Luis
    SENSORS, 2023, 23 (01)
  • [39] Supervised learning algorithms for multi-class classification problems with partial class memberships
    Waegeman, Willem
    Verwaeren, Jan
    Slabbinck, Bram
    De Baets, Bernard
    FUZZY SETS AND SYSTEMS, 2011, 184 (01) : 106 - 125
  • [40] Investigating the Impact of Signal Resolution on Machine Learning based Multi-Class Fault Detection
    Akin, Vehbi
    Mete, Mutlu
    17TH IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE, DCAS 2024, 2024,