Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering

被引:47
作者
Li, Qianmu [1 ,2 ,3 ,4 ,5 ,6 ]
Song, Yanjun [1 ,7 ]
Zhang, Jing [1 ]
Sheng, Victor S. [3 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Wuyi Univ, Intelligent Mfg Dept, Jiangmen 529020, Peoples R China
[3] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[4] Nanjing XiaoZhuang Univ, Nanjing 211171, Peoples R China
[5] Jinling Inst Technol, Nanjing 211169, Peoples R China
[6] Jiangsu Zhongtian Technol Co Ltd, Nantong 226463, Peoples R China
[7] Nanjing Liancheng Technol Dev Co Ltd, Nanjing 210008, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced learning; Multiclass classification; One-versus-one decomposition; Spectral clustering; DATA-SETS; CLASSIFICATION; SMOTE;
D O I
10.1016/j.eswa.2019.113152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many real-world applications, an algorithm needs to learn multiclass classification models from data with imbalanced class distributions. Multiclass imbalanced learning is currently receiving increased attention from researchers. In contrast to traditional imbalanced learning on binary datasets, multiclass imbalanced learning faces great challenges from the variety of changes in the class distributions as well as the inadequate performance of multiclass classification algorithms. In this paper, we propose a novel data preprocessing-based method to solve this problem. The proposed method combines a one-versus-one (OVO) decomposition of class pairs and a spectral clustering technique. This method first decomposes a multiclass dataset into several binary-class datasets. Then, it uses spectral clustering to divide the minority classes of binary-class subsets into subspaces and oversamples them according to the characteristics of the data. Sampling based on spectral clustering takes into account the distribution of the data and effectively avoids oversampling outliers. After the data approximately reaches the equilibrium point, multiclass classifiers can be trained from these rebalanced data. We compared the proposed method with five state-of-the-art multiclass imbalanced learning methods on seven multiclass datasets, using multiclass area under the ROC curve (MAUC), the precision of minor classes (P-min) and the average precision of all classes (P-avg) as the performance metrics. The experimental results show that our proposed method has the best overall performance. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 32 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[3]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[4]   Novel classifier scheme for imbalanced problems [J].
Di Martino, Matias ;
Fernandez, Alicia ;
Iturralde, Pablo ;
Lecumberry, Federico .
PATTERN RECOGNITION LETTERS, 2013, 34 (10) :1146-1151
[5]   A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets [J].
Fernandez, Alberto ;
Garcia, Salvador ;
Jose del Jesus, Maria ;
Herrera, Francisco .
FUZZY SETS AND SYSTEMS, 2008, 159 (18) :2378-2398
[6]   Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches [J].
Fernandez, Alberto ;
Lopez, Victoria ;
Galar, Mikel ;
Jose del Jesus, Maria ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2013, 42 :97-110
[7]   An experimental comparison of performance measures for classification [J].
Ferri, C. ;
Hernandez-Orallo, J. ;
Modroiu, R. .
PATTERN RECOGNITION LETTERS, 2009, 30 (01) :27-38
[8]  
Ghanem Amal S., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P2881, DOI 10.1109/ICPR.2010.706
[9]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[10]   A simple generalisation of the area under the ROC curve for multiple class classification problems [J].
Hand, DJ ;
Till, RJ .
MACHINE LEARNING, 2001, 45 (02) :171-186