Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification

被引:4
作者
Zak, Michal [1 ]
Wozniak, Michal [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wyb Wyspianskiego 27, PL-50370 Wroclaw, Poland
来源
COMPUTATIONAL SCIENCE - ICCS 2020, PT IV | 2020年 / 12140卷
关键词
Multi-class classification; Imbalanced data; Binarization strategies; BINARY;
D O I
10.1007/978-3-030-50423-6_11
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the complexity of the task and increase the difficulty of building a quality data model by learning algorithms. One of the ways of addressing these challenges are so-called binarization strategies, which allow for decomposition of the multi-class problem into several binary tasks with lower complexity. Because of the different decomposition schemes used by each of those methods, some of them are considered to be better suited for handling imbalanced data than the others. In this study, we focus on the well-known binary approaches, namely One-Vs-All, One-Vs-One, and Error-Correcting Output Codes, and their effectiveness in multi-class imbalanced data classification, with respect to the base classifiers and various aggregation schemes for each of the strategies. We compare the performance of these approaches and try to boost the performance of seemingly weaker methods by sampling algorithms. The detailed comparative experimental study of the considered methods, supported by the statistical analysis, is presented. The results show the differences among various binarization strategies. We show how one can mitigate those differences using simple oversampling methods.
引用
收藏
页码:141 / 155
页数:15
相关论文
共 29 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]   Reducing multiclass to binary: A unifying approach for margin classifiers [J].
Allwein, EL ;
Schapire, RE ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2001, 1 (02) :113-141
[3]   Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates [J].
Anand, Ashish ;
Suganthan, P. N. .
JOURNAL OF THEORETICAL BIOLOGY, 2009, 259 (03) :533-540
[4]  
Batista G. E. A. P. A., 2003, WOB, P35
[5]  
Batista GEAPA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[6]  
Chan P. K., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P164
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]  
Dietterich TG, 1994, J ARTIF INTELL RES, V2, P263
[9]  
Duda R. O, 2000, Pattern Classification and Scene Analysis, V2nd
[10]   Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches [J].
Fernandez, Alberto ;
Lopez, Victoria ;
Galar, Mikel ;
Jose del Jesus, Maria ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2013, 42 :97-110