Effect of data preprocessing on ensemble learning for classification in disease diagnosis

被引:1
|
作者
Ozkan, Yuksel [1 ]
Demirarslan, Mert [1 ]
Suner, Asli [1 ]
机构
[1] Ege Univ, Fac Med, Dept Biostat & Med Informat, TR-35100 Izmir, Turkey
关键词
Diagnosis; Data preprocessing; Missing data; Class noise; Class imbalance; Ensemble learning; MULTIPLE IMPUTATION; LABEL NOISE; ROC CURVE; MACHINE; MEDICINE;
D O I
10.1080/03610918.2022.2053717
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years, supervised machine learning methods have increased attention to extracting clinically relevant information from complex health data. Ensemble learning methods enable the establishment of more successful models by training multiple learners jointly to solve the same problem. Herein, we aimed to compare the performance of classification algorithms after data preprocessing to problems such as missing data, class noise, and class imbalance that may be encountered in the datasets used to make an accurate disease diagnosis. To this end, we used random forest and weighted subspace random forest as bagging algorithms while additive logistic regression and gradient boosted machines algorithms were used as boosting algorithms. The performance and running time of the algorithms were also calculated. Our findings indicated that the performance of algorithms increased after data preprocessing and the performance of boosting algorithms yielded higher results than the bagging algorithms. We also observed that the boosting algorithms were the longest-running ones. In conclusion, complementing existing studies, our work highlights the importance and effect of using multiple data preprocessing methods together.
引用
收藏
页码:1657 / 1677
页数:21
相关论文
共 50 条
  • [41] Ensemble Deep Learning Models for Heart Disease Classification: A Case Study from Mexico
    Baccouche, Asma
    Garcia-Zapirain, Begonya
    Olea, Cristian Castillo
    Elmaghraby, Adel
    INFORMATION, 2020, 11 (04)
  • [42] Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
    Olisah, Chollette C.
    Smith, Lyndon
    Smith, Melvyn
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 220
  • [43] A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
    Liu, Na
    Li, Xiaomei
    Qi, Ershi
    Xu, Man
    Li, Ling
    Gao, Bo
    IEEE ACCESS, 2020, 8 : 171263 - 171280
  • [44] Effect of Various Data Preprocessing in Sequence Embedding-Based Machine Learning for Human-Virus PPI Classification
    Indriani, Fatma
    Mahmudah, Kunti Rabiatul
    Satou, Kenji
    2021 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATICS ENGINEERING (IC2IE 2021), 2021, : 74 - 78
  • [45] Enhancing Cardiovascular Disease Diagnosis: The Power of Optimized Ensemble Learning
    Yazdi, Fatemeh
    Asadi, Shahrokh
    IEEE ACCESS, 2025, 13 : 46747 - 46762
  • [46] Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Xue, Bing
    Andreae, Peter
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 577 - 589
  • [47] Ensemble Learning for Question Classification
    Su, Lei
    Liao, Hongzhi
    Yu, Zhengtao
    Zhao, Quan
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, : 501 - +
  • [48] Ensemble Heuristic-Metaheuristic Feature Fusion Learning for Heart Disease Diagnosis Using Tabular Data
    Shokouhifar, Mohammad
    Hasanvand, Mohamad
    Moharamkhani, Elaheh
    Werner, Frank
    ALGORITHMS, 2024, 17 (01)
  • [49] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Zhi Chen
    Tao Lin
    Xin Xia
    Hongyan Xu
    Sha Ding
    Applied Intelligence, 2018, 48 : 2441 - 2457
  • [50] Classification of Parkinson's Disease by Decision Tree Based Instance Selection and Ensemble Learning Algorithms
    Li, Yongming
    Yang, Liuyang
    Wang, Pin
    Zhang, Cheng
    Xiao, Jie
    Zhang, Yanling
    Qiu, Mingguo
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2017, 7 (02) : 444 - 452