Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

被引:11
|
作者
Dorn, Marcio [1 ,2 ,3 ]
Grisci, Bruno Iochins [1 ]
Narloch, Pedro Henrique [1 ]
Feltes, Bruno Cesar [1 ,4 ]
Avila, Eduardo [3 ,5 ]
Kahmann, Alessandro [6 ]
Alho, Clarice Sampaio [3 ,5 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Univ Fed Rio Grande do Sul, Ctr Biotechnol, Porto Alegre, RS, Brazil
[3] Natl Inst Sci & Technol, Forens Sci, Porto Alegre, RS, Brazil
[4] Univ Fed Rio Grande do Sul, Dept Genet, Porto Alegre, RS, Brazil
[5] Pontificia Univ Catolica Rio Grande do Sul, Sch Hlth & Life Sci, Porto Alegre, RS, Brazil
[6] Fed Univ Rio Grande, Inst Math Stat & Phys, Rio Grande, RS, Brazil
关键词
Machine learning; Data mining; Imbalanced datasets; Covid; Hemogram; CORONAVIRUS DISEASE 2019; CLASSIFICATION; WAVE; TREES; RISK; CT;
D O I
10.7717/peerj-cs.670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil's case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.
引用
收藏
页码:1 / 34
页数:34
相关论文
共 50 条
  • [1] Analysis of COVID-19 Datasets Using Statistical Modelling and Machine Learning Techniques to Predict the Disease
    Nramban Kannan S.K.
    Kolla B.P.
    Sengan S.
    Muthusamy R.
    Manikandan R.
    Patel K.K.
    Dadheech P.
    SN Computer Science, 5 (1)
  • [2] COVID-19 Mortality Prediction Using Machine Learning Techniques
    Schirato, Lindsay
    Makina, Kennedy
    Flanders, Dwayne
    Pouriyeh, Seyedamin
    Shahriar, Hossain
    2021 IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH (ICDH 2021), 2021, : 197 - 202
  • [3] Predicting COVID-19 Outcomes: Machine Learning Predictions Across Diverse Datasets
    Panc, Kemal
    Hursoy, Nur
    Basaran, Mustafa
    Yazici, Mumin Murat
    Kaba, Esat
    Nalbant, Ercan
    Gundogdu, Hasan
    Gurun, Enes
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (12)
  • [4] PERFORMANCE COMPARISON OF APACHE SPARK AND HADOOP FOR MACHINE LEARNING BASED ITERATIVE GBTR ON HIGGS AND COVID-19 DATASETS
    Sewal, Piyush
    Singh, Hari
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (03): : 1373 - 1386
  • [5] PERFORMANCE COMPARISON OF APACHE SPARK AND HADOOP FOR MACHINE LEARNING BASED ITERATIVE GBTR ON HIGGS AND COVID-19 DATASETS
    Sewal P.
    Singh H.
    Scalable Computing, 2024, 25 (03): : 1373 - 1386
  • [6] A Hybrid Machine Learning Methodology for Imbalanced Datasets
    Lipitakis, Anastasia-Dimitra
    Kotsiantis, Sotirios
    5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
  • [7] A comprehensive review of COVID-19 detection with machine learning and deep learning techniques
    Das, Sreeparna
    Ayus, Ishan
    Gupta, Deepak
    HEALTH AND TECHNOLOGY, 2023, 13 (04) : 679 - 692
  • [8] A comprehensive review of COVID-19 detection with machine learning and deep learning techniques
    Sreeparna Das
    Ishan Ayus
    Deepak Gupta
    Health and Technology, 2023, 13 : 679 - 692
  • [9] A comparison of machine learning algorithms in predicting COVID-19 prognostics
    Ustebay, Serpil
    Sarmis, Abdurrahman
    Kaya, Gulsum Kubra
    Sujan, Mark
    INTERNAL AND EMERGENCY MEDICINE, 2023, 18 (01) : 229 - 239
  • [10] A comparison of machine learning algorithms in predicting COVID-19 prognostics
    Serpil Ustebay
    Abdurrahman Sarmis
    Gulsum Kubra Kaya
    Mark Sujan
    Internal and Emergency Medicine, 2023, 18 : 229 - 239