Exploring Early Prediction of Chronic Kidney Disease Using Machine Learning Algorithms for Small and Imbalanced Datasets

被引:15
|
作者
da Silveira, Andressa C. M. [1 ]
Sobrinho, Alvaro [2 ,3 ]
da Silva, Leandro Dias [3 ]
Costa, Evandro de Barros [4 ]
Pinheiro, Maria Eliete [4 ]
Perkusich, Angelo [5 ]
机构
[1] Univ Fed Campina Grande, Elect Engn Dept, BR-58428830 Campina Grande, Paraiba, Brazil
[2] Fed Univ Agreste Pernambuco, Comp Sci, BR-55292270 Garanhuns, Brazil
[3] Univ Fed Alagoas, Comp Inst, BR-57072900 Maceio, Alagoas, Brazil
[4] Univ Fed Alagoas, Fac Med, BR-57072900 Maceio, Alagoas, Brazil
[5] Univ Fed Campina Grande, Virtus Res Dev & Innovat Ctr, BR-58428830 Campina Grande, Paraiba, Brazil
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 07期
关键词
primary care; machine learning; limited size datasets; public health; imbalanced datasets; SMOTE; CLASSIFICATION; GUIDELINE;
D O I
10.3390/app12073673
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Chronic kidney disease (CKD) is a worldwide public health problem, usually diagnosed in the late stages of the disease. To alleviate such issue, investment in early prediction is necessary. The purpose of this study is to assist the early prediction of CKD, addressing problems related to imbalanced and limited-size datasets. We used data from medical records of Brazilians with or without a diagnosis of CKD, containing the following attributes: hypertension, diabetes mellitus, creatinine, urea, albuminuria, age, gender, and glomerular filtration rate. We present an oversampling approach based on manual and automated augmentation. We experimented with the synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, and Borderline-SMOTE SVM. We implemented models based on the algorithms: decision tree (DT), random forest, and multi-class AdaBoosted DTs. We also applied the overall local accuracy and local class accuracy methods for dynamic classifier selection; and the k-nearest oracles-union, k-nearest oracles-eliminate, and META-DES for dynamic ensemble selection. We analyzed the models' performances using the hold-out validation, multiple stratified cross-validation (CV), and nested CV. The DT model presented the highest accuracy score (98.99%) using the manual augmentation and SMOTE. Our approach can assist in designing systems for the early prediction of CKD using imbalanced and limited-size datasets.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Early Prediction of Chronic Kidney Disease Using Machine Learning Algorithms with Feature Selection Techniques
    Habiba, Sultana Umme
    Tasnim, Farzana
    Chowdhury, Mohammad Saeed Hasan
    Islam, Md Khairul
    Nahar, Lutfun
    Mahmud, Tanjim
    Kaiser, M. Shamim
    Hossain, Mohammad Shahadat
    Andersson, Karl
    APPLIED INTELLIGENCE AND INFORMATICS, AII 2023, 2024, 2065 : 224 - 242
  • [2] Machine Learning Algorithms as a Boon for Chronic Kidney Disease Prediction
    Dayma, Reshma
    Patel, Sajid
    Patel, Dhruti
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 499 - 508
  • [3] PREDICTION OF DISEASE SEVERITY USING MACHINE LEARNING ALGORITHMS: AN ANALYSIS FOR CHRONIC KIDNEY DISEASE IN THE US
    Verma, V.
    Rastogi, M.
    Bharti, S.
    Pandey, S.
    Sanyal, S.
    Bansal, V
    Gaur, A.
    Daral, S.
    Kukreja, I
    Nayyar, A.
    Roy, A.
    VALUE IN HEALTH, 2024, 27 (06) : S393 - S393
  • [4] Prediction of Adult Chronic Kidney Disease with Class-Imbalanced Datasets
    Zhu Cuiliang
    Yuan Jiucun
    Yang Chenwei
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3231 - 3238
  • [5] Chronic Kidney Disease Prediction Using Machine Learning
    Kaur, Chamandeep
    Kumar, M. Sunil
    Anjum, Afsana
    Binda, M. B.
    Mallu, Maheswara Reddy
    Al Ansari, Mohammed Saleh
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (02) : 384 - 391
  • [6] Predicting Chronic Kidney Disease Using Machine Learning Algorithms
    Farjana, Afia
    Liza, Fatema Tabassum
    Pandit, Parth Pratim
    Das, Madhab Chandra
    Hasan, Mahadi
    Tabassum, Fariha
    Hossen, Md. Helal
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 1267 - 1271
  • [7] Early Prediction of Chronic Kidney Disease Using Machine Learning Supported by Predictive Analytics
    Aljaaf, Ahmed J.
    Al-Jumeily, Dhiya
    Haglan, Hussein M.
    Alloghani, Mohamed
    Baker, Thar
    Hussain, Abir J.
    Mustafina, Jamila
    2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 251 - 259
  • [8] Performance Analysis of Machine Learning Algorithms on Imbalanced Datasets Using SMOTE Technique
    Kumar, Bala Santhosh
    Yadav, Pasupula Praveen
    Prasad, P. Penchala
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 147 - 156
  • [9] Chronic kidney disease prediction using machine learning techniques
    Debal, Dibaba Adeba
    Sitote, Tilahun Melak
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [10] Chronic Kidney Disease Prediction Using Machine Learning Methods
    Ekanayake, Imesh Udara
    Herath, Damayanthi
    MERCON 2020: 6TH INTERNATIONAL MULTIDISCIPLINARY MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2020, : 260 - 265