Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches

被引:1
作者
Mbunge, Elliot [1 ]
Millham, Richard C. [2 ]
Sibiya, Maureen Nokuthula [3 ]
Chemhaka, Garikayi [4 ]
Takavarasha, Sam, Jr. [5 ]
Muchemwa, Benhildah [1 ]
Dzinamarira, Tafadzwa [6 ]
机构
[1] Univ Eswatini, Dept Comp Sci, Fac Sci & Engn, Kwaluseni, Manzini, Eswatini
[2] Durban Univ Technol, Dept Informat, Fac Accounting & Informat, ZA-4001 Durban, South Africa
[3] Mangosuthu Univ Technol, Res Innovat & Engagement, 511 Griffiths Mxenge Hwy, ZA-4031 Umlazi, South Africa
[4] Univ Eswatini, Dept Stat & Demog, Fac Social Sci, Kwaluseni Campus, Kwaluseni, Eswatini
[5] Womens Univ Africa, Fac Management & Entrepreneurial Sci, Harare, Zimbabwe
[6] ICAP, Harare, Zimbabwe
来源
2023 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY, ICTAS | 2023年
关键词
Diarrhoea; Ensemble methods; Children; class imbalance; machine learning; Prediction; Zimbabwe;
D O I
10.1109/ICTAS56421.2023.10082744
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data- driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.
引用
收藏
页码:90 / 95
页数:6
相关论文
共 43 条
  • [1] Solving the Problem of Class Imbalance in the Prediction of Hotel Cancelations: A Hybridized Machine Learning Approach
    Adil, Mohd
    Ansari, Mohd Faizan
    Alahmadi, Ahmad
    Wu, Jei-Zheng
    Chakrabortty, Ripon K.
    [J]. PROCESSES, 2021, 9 (10)
  • [2] Ahmed S, 2017, 2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), P126, DOI [10.1109/CSITSS.2017.8447799, 10.1109/HONET.2017.8102204]
  • [3] Akinnuwesi B.A., 2022, Data Science and Management, DOI DOI 10.1016/J.DSM.2022.10.001
  • [4] [Anonymous], 2016, ZIMBABWE DEMOGRAPHIC, V44, DOI [10.1088/1751-8113/44/8/085201, DOI 10.1088/1751-8113/44/8/085201]
  • [5] Bhagat RC, 2015, IEEE INT ADV COMPUT, P403, DOI 10.1109/IADCC.2015.7154739
  • [6] An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme
    Bi, Jingjun
    Zhang, Chongsheng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 158 : 81 - 93
  • [7] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [8] Predicting HIV Status Using Machine Learning Techniques and Bio-Behavioural Data from the Zimbabwe Population-Based HIV Impact Assessment (ZIMPHIA15-16)
    Chingombe, Innocent
    Musuka, Godfrey
    Mbunge, Elliot
    Chemhaka, Garikayi
    Cuadros, Diego F.
    Murewanhema, Grant
    Chaputsira, Simbarashe
    Batani, John
    Muchemwa, Benhildah
    Mapingure, Munyaradzi P.
    Dzinamarira, Tafadzwa
    [J]. ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 247 - 258
  • [9] Predicting HIV Status among Men Who Have Sex with Men in Bulawayo & Harare, Zimbabwe Using Bio-Behavioural Data, Recurrent Neural Networks, and Machine Learning Techniques
    Chingombe, Innocent
    Dzinamarira, Tafadzwa
    Cuadros, Diego
    Mapingure, Munyaradzi Paul
    Mbunge, Elliot
    Chaputsira, Simbarashe
    Madziva, Roda
    Chiurunge, Panashe
    Samba, Chesterfield
    Herrera, Helena
    Murewanhema, Grant
    Mugurungi, Owen
    Musuka, Godfrey
    [J]. TROPICAL MEDICINE AND INFECTIOUS DISEASE, 2022, 7 (09)
  • [10] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15