MultiThal-classifier, a machine learning-based multi-class model for thalassemia diagnosis and classification

被引:0
作者
Wang, Wenqiang [1 ]
Ye, Renqing [1 ]
Tang, Baojia [1 ]
Qi, Yuying [1 ]
机构
[1] Ningde Normal Univ, Dept Clin Lab, Ningde Municipal Hosp, 13 Mindong Rd East,Dongqiao Econ & Technol Dev Zon, Ningde 352100, Fujian, Peoples R China
关键词
Thalassemia; Iron Deficiency Anemia; Machine Learning; Multi-Class Model; Hematological Parameters; IRON-DEFICIENCY;
D O I
10.1016/j.cca.2024.120025
中图分类号
R446 [实验室诊断]; R-33 [实验医学、医学实验];
学科分类号
1001 ;
摘要
Background: The differential diagnosis between iron deficiency anemia (IDA) and thalassemia trait (TT) remains a significant clinical challenge. This study aimed to develop a machine learning-based multi-class model to differentiate among Microcytic-TT(TT with low mean corpuscular volume), Normocytic-TT (TT with normal mean corpuscular volume), IDA, and healthy individuals. Methods: A comprehensive dataset comprising 1,819 individuals was analyzed using six distinct machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) algorithm was ultimately selected to construct the MultiThal-Classifier (M-THAL) model. SMOTENC (Synthetic Minority Over-sampling Technique for Nominal and Continuous features) was employed to address data imbalance. Model performance was evaluated using various metrics, and SHAP values were applied to interpret the model's predictions.Additionally, external validation was conducted to assess the model's robustness and generalizability. Results: After performing 1000 bootstrap resamples of the test set, the average performance metrics of M-THAL and the 95 % confidence interval(CI) were as follows, sensitivity 90.27 % (95 % CI: 84.88-95.26), specificity 97.87 % (95% CI: 97.10-98.55), PPV 93.42 % (95 % CI: 89.34-96.48), NPV 97.82% (95 % CI: 97.00-98.53), F1score 91.50 % (95% CI: 87.29-95.34), Youden's index 88.15 % (95 % CI: 82.33-93.70), accuracy 97.06 % (95% CI: 96.06-97.99), and AUC 94.07 % (95 % CI: 91.17-96.84).Feature importance analysis identified mean corpuscular volume(MCV), mean corpuscular hemoglobin(MCH), red cell distribution width - standard deviation(RDW-SD), and hemoglobin (HGB) were identified as the most important features. External validation confirmed the model's robustness and generalizability. Conclusion: The M-THAL effectively distinguishes Normocytic-TT, Microcytic-TT, IDA, and healthy individuals using hematological parameters, offers a rapid and cost-effective screening tool that can be readily implemented in diverse healthcare settings.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Multi-class Cell Line Classification using Digital Holographic Microscopy and Machine Learning
    Sun, Anyu
    Van Lam
    Thuc Phan
    Chang, Lin-Ching
    Nehmetallah, George
    Raub, Christopher
    BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS, 2022, 12097
  • [22] A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT)
    Ghasemkhani, Bita
    Balbal, Kadriye Filiz
    Birant, Derya
    MATHEMATICS, 2024, 12 (18)
  • [23] Integrated Multi-Class Classification and Prediction of GPCR Allosteric Modulators by Machine Learning Intelligence
    Hou, Tianling
    Bian, Yuemin
    McGuire, Terence
    Xie, Xiang-Qun
    BIOMOLECULES, 2021, 11 (06)
  • [24] Multi-class Text Classification Using Machine Learning Models for Online Drug Reviews
    Joshi, Shreehar
    Abdelfattah, Eman
    2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 262 - 267
  • [25] A novel progressive learning technique for multi-class classification
    Venkatesan, Rajasekar
    Er, Meng Joo
    NEUROCOMPUTING, 2016, 207 : 310 - 321
  • [26] A Weighted Machine Learning-Based Attacks Classification to Alleviating Class Imbalance
    Chkirbene, Zina
    Erbad, Aiman
    Hamila, Ridha
    Gouissem, Ala
    Mohamed, Amr
    Guizani, Mohsen
    Hamdi, Mounir
    IEEE SYSTEMS JOURNAL, 2021, 15 (04): : 4780 - 4791
  • [27] Deep Learning and Machine Learning-Based Model for Conversational Sentiment Classification
    Ullah, Sami
    Talib, Muhammad Ramzan
    Rana, Toqir A.
    Hanif, Muhammad Kashif
    Awais, Muhammad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (02): : 2323 - 2339
  • [28] Supervised machine learning-based multi-class phase prediction in high-entropy alloys using robust databases
    Onate, Angelo
    Sanhueza, Juan Pablo
    Zegpi, Diabb
    Tuninetti, Victor
    Ramirez, Jesus
    Medina, Carlos
    Melendrez, Manuel
    Rojas, David
    JOURNAL OF ALLOYS AND COMPOUNDS, 2023, 962
  • [29] Overlooked pitfalls in multi-class machine learning classification in radiation oncology and how to avoid them
    Chatterjee, Avishek
    Vallieres, Martin
    Seuntjens, Jan
    PHYSICA MEDICA-EUROPEAN JOURNAL OF MEDICAL PHYSICS, 2020, 70 : 96 - 100
  • [30] Machine Learning Algorithms for Raw and Unbalanced Intrusion Detection Data in a Multi-Class Classification Problem
    Bacevicius, Mantas
    Paulauskaite-Taraseviciene, Agne
    APPLIED SCIENCES-BASEL, 2023, 13 (12):