Effect of data preprocessing on ensemble learning for classification in disease diagnosis

被引:1
|
作者
Ozkan, Yuksel [1 ]
Demirarslan, Mert [1 ]
Suner, Asli [1 ]
机构
[1] Ege Univ, Fac Med, Dept Biostat & Med Informat, TR-35100 Izmir, Turkey
关键词
Diagnosis; Data preprocessing; Missing data; Class noise; Class imbalance; Ensemble learning; MULTIPLE IMPUTATION; LABEL NOISE; ROC CURVE; MACHINE; MEDICINE;
D O I
10.1080/03610918.2022.2053717
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years, supervised machine learning methods have increased attention to extracting clinically relevant information from complex health data. Ensemble learning methods enable the establishment of more successful models by training multiple learners jointly to solve the same problem. Herein, we aimed to compare the performance of classification algorithms after data preprocessing to problems such as missing data, class noise, and class imbalance that may be encountered in the datasets used to make an accurate disease diagnosis. To this end, we used random forest and weighted subspace random forest as bagging algorithms while additive logistic regression and gradient boosted machines algorithms were used as boosting algorithms. The performance and running time of the algorithms were also calculated. Our findings indicated that the performance of algorithms increased after data preprocessing and the performance of boosting algorithms yielded higher results than the bagging algorithms. We also observed that the boosting algorithms were the longest-running ones. In conclusion, complementing existing studies, our work highlights the importance and effect of using multiple data preprocessing methods together.
引用
收藏
页码:1657 / 1677
页数:21
相关论文
共 50 条
  • [31] Data Preprocessing Method in Motor Fault Diagnosis Using Unsupervised Learning
    Choi, Dong-Jin
    Han, Ji-Hoon
    Park, Sang-Uk
    Hong, Sun-Ki
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 1508 - 1511
  • [32] A Survey of Modulation Classification Using Deep Learning: Signal Representation and Data Preprocessing
    Peng, Shengliang
    Sun, Shujun
    Yao, Yu-Dong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7020 - 7038
  • [33] Unbalanced data sentiment classification method based on ensemble learning
    Duan, Jidong
    Ma, Kun
    Sun, Runyuan
    PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019), 2019, : 34 - 38
  • [34] A novel ensemble machine learning for robust microarray data classification
    Peng, Yonghong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2006, 36 (06) : 553 - 573
  • [35] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [36] An ensemble learning method for classification of multiple-label data
    Guangdong Power Dispatching and Controlling Center, Guangzhou, China
    不详
    不详
    J. Comput. Inf. Syst., 12 (4539-4546): : 4539 - 4546
  • [37] Performance assessment of ensemble learning systems in financial data classification
    Lahmiri, Salim
    Bekiros, Stelios
    Giakoumelou, Anastasia
    Bezzina, Frank
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2020, 27 (01) : 3 - 9
  • [38] Progressive Ensemble Kernel-Based Broad Learning System for Noisy Data Classification
    Yu, Zhiwen
    Lan, Kankan
    Liu, Zhulin
    Han, Guoqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9656 - 9669
  • [39] Ensemble learning method for classification: Integrating data envelopment analysis with machine learning
    An, Qingxian
    Huang, Siwei
    Han, Yuxuan
    Zhu, You
    COMPUTERS & OPERATIONS RESEARCH, 2024, 169
  • [40] Multi-objective Evolutionary Ensemble Learning for Disease Classification
    Li, Nan
    Ma, Lianbo
    Zhang, Tian
    He, Meirui
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT I, 2022, : 491 - 500