CCR: A COMBINED CLEANING AND RESAMPLING ALGORITHM FOR IMBALANCED DATA CLASSIFICATION

被引:82
作者
Koziarski, Michal [1 ]
Wozniak, Michal [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wybrzeze Wyspianskiego 27, PL-50370 Wroclaw, Poland
关键词
machine learning; classification; imbalanced data; preprocessing; oversampling; SAMPLING APPROACH; DATA-SETS;
D O I
10.1515/amcs-2017-0050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered.
引用
收藏
页码:727 / 736
页数:10
相关论文
共 51 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]  
[Anonymous], P 7 IEEE INT C DAT M
[4]  
[Anonymous], 2012, 10 INT FLINS C UNC M
[5]   A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications With Imbalanced Data [J].
Antonio Sanz, Jose ;
Bernardo, Dario ;
Herrera, Francisco ;
Bustince, Humberto ;
Hagras, Hani .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (04) :973-990
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[8]   CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (01) :44-58
[9]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)