Effect of data preprocessing on ensemble learning for classification in disease diagnosis

被引:1
|
作者
Ozkan, Yuksel [1 ]
Demirarslan, Mert [1 ]
Suner, Asli [1 ]
机构
[1] Ege Univ, Fac Med, Dept Biostat & Med Informat, TR-35100 Izmir, Turkey
关键词
Diagnosis; Data preprocessing; Missing data; Class noise; Class imbalance; Ensemble learning; MULTIPLE IMPUTATION; LABEL NOISE; ROC CURVE; MACHINE; MEDICINE;
D O I
10.1080/03610918.2022.2053717
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years, supervised machine learning methods have increased attention to extracting clinically relevant information from complex health data. Ensemble learning methods enable the establishment of more successful models by training multiple learners jointly to solve the same problem. Herein, we aimed to compare the performance of classification algorithms after data preprocessing to problems such as missing data, class noise, and class imbalance that may be encountered in the datasets used to make an accurate disease diagnosis. To this end, we used random forest and weighted subspace random forest as bagging algorithms while additive logistic regression and gradient boosted machines algorithms were used as boosting algorithms. The performance and running time of the algorithms were also calculated. Our findings indicated that the performance of algorithms increased after data preprocessing and the performance of boosting algorithms yielded higher results than the bagging algorithms. We also observed that the boosting algorithms were the longest-running ones. In conclusion, complementing existing studies, our work highlights the importance and effect of using multiple data preprocessing methods together.
引用
收藏
页码:1657 / 1677
页数:21
相关论文
共 50 条
  • [1] Multiple Imputation and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    Lam Thu Bui
    INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2016, 2017, 8 : 401 - 415
  • [2] Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 367 - 379
  • [3] Ensemble Learning Classification for Medical Diagnosis
    Lohumi, Pratyush
    Garg, Sarthak
    Singh, Taran Pal
    Gopal, Madan
    PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [4] New data preprocessing trends based on ensemble of multiple preprocessing techniques
    Mishra, Puneet
    Biancolillo, Alessandra
    Roger, Jean Michel
    Marini, Federico
    Rutledge, Douglas N.
    TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2020, 132
  • [5] A preprocessing method combined with an ensemble framework for the multiclass imbalanced data classification
    Pavan Kumar M.R.
    Jayagopal P.
    International Journal of Computers and Applications, 2022, 44 (12) : 1178 - 1185
  • [6] Efficient Data Preprocessing with Ensemble Machine Learning Technique for the Early Detection of Chronic Kidney Disease
    Venkatesan, Vinoth Kumar
    Ramakrishna, Mahesh Thyluru
    Izonin, Ivan
    Tkachenko, Roman
    Havryliuk, Myroslav
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [7] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [8] Hierarchical Ensemble Learning for Alzheimer's Disease Classification
    Wang, Ruyue
    Li, Hanhui
    Lan, Rushi
    Luo, Suhuai
    Luo, Xiaonan
    2018 7TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH 2018), 2018, : 224 - 229
  • [9] Intrusion detection based on ensemble learning for big data classification
    Jemili, Farah
    Meddeb, Rahma
    Korbaa, Ouajdi
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 3771 - 3798
  • [10] imDC: an ensemble learning method for imbalanced classification with miRNA data
    Wang, C. Y.
    Hu, L. L.
    Guo, M. Z.
    Liu, X. Y.
    Zou, Q.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (01): : 123 - 133