Effect of data preprocessing on ensemble learning for classification in disease diagnosis

被引:1
|
作者
Ozkan, Yuksel [1 ]
Demirarslan, Mert [1 ]
Suner, Asli [1 ]
机构
[1] Ege Univ, Fac Med, Dept Biostat & Med Informat, TR-35100 Izmir, Turkey
关键词
Diagnosis; Data preprocessing; Missing data; Class noise; Class imbalance; Ensemble learning; MULTIPLE IMPUTATION; LABEL NOISE; ROC CURVE; MACHINE; MEDICINE;
D O I
10.1080/03610918.2022.2053717
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years, supervised machine learning methods have increased attention to extracting clinically relevant information from complex health data. Ensemble learning methods enable the establishment of more successful models by training multiple learners jointly to solve the same problem. Herein, we aimed to compare the performance of classification algorithms after data preprocessing to problems such as missing data, class noise, and class imbalance that may be encountered in the datasets used to make an accurate disease diagnosis. To this end, we used random forest and weighted subspace random forest as bagging algorithms while additive logistic regression and gradient boosted machines algorithms were used as boosting algorithms. The performance and running time of the algorithms were also calculated. Our findings indicated that the performance of algorithms increased after data preprocessing and the performance of boosting algorithms yielded higher results than the bagging algorithms. We also observed that the boosting algorithms were the longest-running ones. In conclusion, complementing existing studies, our work highlights the importance and effect of using multiple data preprocessing methods together.
引用
收藏
页码:1657 / 1677
页数:21
相关论文
共 50 条
  • [21] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [22] Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification
    Ruiz-Chavez, Zoila
    Salvador-Meneses, Jaime
    Garcia-Rodriguez, Jose
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 297 - 304
  • [23] Plant Disease Classification using Ensemble Deep Learning
    Gunduz, Huseyin
    Gunduz, Sevcan Yilmaz
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [24] Multi-window based ensemble learning for classification of imbalanced streaming data
    Li, Hu
    Wang, Ye
    Wang, Hua
    Zhou, Bin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1507 - 1525
  • [25] Classification of Tomato Leaf Disease Using Ensemble Learning
    Sreedevi, Alampally
    Srinivas, K.
    IMPENDING INQUISITIONS IN HUMANITIES AND SCIENCES, ICIIHS-2022, 2024, : 289 - 294
  • [26] Multi-window based ensemble learning for classification of imbalanced streaming data
    Hu Li
    Ye Wang
    Hua Wang
    Bin Zhou
    World Wide Web, 2017, 20 : 1507 - 1525
  • [27] Rockburst Intensity Grade Prediction Based on Data Preprocessing Techniques and Multi-model Ensemble Learning Algorithms
    Jia, Zhi-Chao
    Wang, Yi
    Wang, Jun-Hui
    Pei, Qiu-Yan
    Zhang, Yan-Qi
    ROCK MECHANICS AND ROCK ENGINEERING, 2024, 57 (07) : 5207 - 5227
  • [28] Deep ensemble learning for Alzheimer's disease classification
    An, Ning
    Ding, Huitong
    Yang, Jiaoyun
    Au, Rhoda
    Ang, Ting F. A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 105
  • [29] Fault Diagnosis of Motor Bearing Using Ensemble Learning Algorithm with FFT-based Preprocessing
    Sikder, Niloy
    Bhakta, Kangkan
    Al Nahid, Abdullah
    Islam, M. M. Manjurul
    2019 1ST INTERNATIONAL CONFERENCE ON ROBOTICS, ELECTRICAL AND SIGNAL PROCESSING TECHNIQUES (ICREST), 2019, : 564 - 569
  • [30] Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
    Md, Abdul Quadir
    Kulkarni, Sanika
    Joshua, Christy Jackson
    Vaichole, Tejas
    Mohan, Senthilkumar
    Iwendi, Celestine
    BIOMEDICINES, 2023, 11 (02)