Data Preprocessing for DES-KNN and Its Application to Imbalanced Medical Data Classification

被引:5
|
作者
Kinal, Maciej [1 ]
Wozniak, Michal [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Elect, Dept Syst & Comp Networks, Wroclaw, Poland
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I | 2020年 / 12033卷
关键词
Dynamic ensemble selection; DES-KNN; Data preprocessing; Imbalanced data; Oversampling; SELECTION;
D O I
10.1007/978-3-030-41964-6_51
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from imbalanced data is a vital challenge for pattern classification. We often face the imbalanced data in medical decision tasks where at least one of the classes is represented by only a very small minority of the available data. We propose a novel framework for training base classifiers and preparing the dynamic selection dataset (dsel) to integrate data preprocessing and dynamic ensemble selection (des) methods for imbalanced data classification. des-knn algorithm has been chosen as the des method and its modifications base on oversampled training and validations sets using smote are discussed. The proposed modifications have been evaluated based on computer experiments carried out on 15 medical datasets with various imbalance ratios. The results of experiments show that the proposed framework is very useful, especially for tasks characterized by the small imbalance ratio.
引用
收藏
页码:589 / 599
页数:11
相关论文
共 50 条
  • [1] Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 367 - 379
  • [2] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413
  • [3] Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing
    Borowska, Katarzyna
    Stepaniuk, Jaroslaw
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2018, 2018, 11127 : 233 - 245
  • [4] Framework for imbalanced data classification
    Blaszczyk, Mikolaj
    Jedrzejowicz, Joanna
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 3477 - 3486
  • [5] The study of preprocessing methods' utility in analysis of multidimensional and highly imbalanced medical data
    Werner, Aleksandra
    Bach, Malgorzata
    Pluskiewicz, Wojciech
    PROCEEDINGS OF THE 11TH SCIENTIFIC CONFERENCE INTERNET IN THE INFORMATION SOCIETY 2016, 2016, : 71 - 87
  • [6] A preprocessing method combined with an ensemble framework for the multiclass imbalanced data classification
    Pavan Kumar M.R.
    Jayagopal P.
    International Journal of Computers and Applications, 2022, 44 (12) : 1178 - 1185
  • [7] Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
    Wegier, Weronika
    Ksieniewicz, Pawel
    ENTROPY, 2020, 22 (08)
  • [8] Research on imbalanced data set preprocessing based on deep learning
    Wang Fangyu
    Zhang Jianhui
    Bu Youjun
    Chen Bo
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 75 - 79
  • [9] Potential Anchoring for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2021, 120
  • [10] Leveraging GANs data augmentation for imbalanced medical image classification
    Ding, Hongwei
    Huang, Nana
    Cui, Xiaohui
    APPLIED SOFT COMPUTING, 2024, 165