Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing

被引:3
|
作者
Borowska, Katarzyna [1 ]
Stepaniuk, Jaroslaw [1 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
来源
COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2018 | 2018年 / 11127卷
关键词
Data preprocessing; Imbalanced data; Rough sets; Oversampling; Parameters tuning; Information granules; SMOTE;
D O I
10.1007/978-3-319-99954-8_20
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Selective preprocessing, representing data-level approach to the imbalanced data problem, is one of the most successful methods. This paper introduces novel algorithm combining this kind of technique with the filtering phase. The information granules are formed to distinguish specific types of positive examples that should be adequately treated. Three modes of oversampling, dedicated to minority class instances placed in specific areas of the feature space, are available. The rough set theory is applied to filter and remove inconsistencies from the generated positive samples. The experimental study shows that proposed method in most cases obtains better or similar performance of standard classifiers, such as C4.5 decision tree, in comparison with other techniques. Additionally, multiple values of algorithm's parameters are evaluated. It is experimentally proven that two of the examined parameters values are the most appropriate to various applications. However, the automatic parameters tuning, based on the specific requirements of different data distributions, is recommended.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [1] A rough-granular approach to the imbalanced data classification problem
    Borowska, K.
    Stepaniuk, J.
    APPLIED SOFT COMPUTING, 2019, 83
  • [2] Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 367 - 379
  • [3] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413
  • [4] Data Preprocessing for DES-KNN and Its Application to Imbalanced Medical Data Classification
    Kinal, Maciej
    Wozniak, Michal
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 589 - 599
  • [5] Research on imbalanced data set preprocessing based on deep learning
    Wang Fangyu
    Zhang Jianhui
    Bu Youjun
    Chen Bo
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 75 - 79
  • [6] Improving Risk Predictions by Preprocessing Imbalanced Credit Data
    Garcia, Vicente
    Isabel Marques, Ana
    Salvador Sanchez, Jose
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT II, 2012, 7664 : 68 - 75
  • [7] The study of preprocessing methods' utility in analysis of multidimensional and highly imbalanced medical data
    Werner, Aleksandra
    Bach, Malgorzata
    Pluskiewicz, Wojciech
    PROCEEDINGS OF THE 11TH SCIENTIFIC CONFERENCE INTERNET IN THE INFORMATION SOCIETY 2016, 2016, : 71 - 87
  • [8] Data Preprocessing for ANN-based Industrial Time-Series Forecasting with Imbalanced Data
    Pisa, Ivan
    Santin, Ignacio
    Lopez Vicario, Jose
    Morell, Antoni
    Vilanova, Ramon
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [9] Hyperparameter Tuning with High Performance Computing Machine Learning for Imbalanced Alzheimer's Disease Data
    Zhang, Fan
    Petersen, Melissa
    Johnson, Leigh
    Hall, James
    O'Bryant, Sid E.
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [10] NEW HYBRID DATA PREPROCESSING TECHNIQUE FOR HIGHLY IMBALANCED DATASET
    Malik, Esraa Faisal
    Khaw, Khai Wah
    Chew, XinYing
    COMPUTING AND INFORMATICS, 2022, 41 (04) : 981 - 1001