Globalized Multiple Balanced Subsets With Collaborative Learning for Imbalanced Data

被引:13
作者
Zhu, Zonghai [1 ,2 ]
Wang, Zhe [1 ,2 ]
Li, Dongdong [2 ]
Du, Wenli [1 ]
机构
[1] East China Univ Sci & Technol, Minist Educ, Key Lab Adv Control & Optimizat Chem Proc, Shanghai 200237, Peoples R China
[2] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
基金
美国国家科学基金会;
关键词
Balanced bagging; collaborative learning; imbalanced data; multiple balanced subsets; regularized learning; CLASSIFICATION; REPRESENTATION; MULTICLASS;
D O I
10.1109/TCYB.2020.3001158
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The skewed distribution of data brings difficulties to classify minority and majority samples in the imbalanced problem. The balanced bagging randomly undersampes majority samples several times and combines the selected majority samples with minority samples to form several balanced subsets, in which the numbers of minority and majority samples are roughly equal. However, the balanced bagging is the lack of a unified learning framework. Moreover, it fails to concern the connection of all subsets and the global information of the entire data distribution. To this end, this article puts several balanced subsets into an effective learning framework with a criterion function. In the learning framework, one regularization term called R-S establishes the connection and realizes the collaborative learning of all subsets by requiring the consistent outputs of the minority samples in different subsets. Besides, another regularization term called R-W provides the global information to each basic classifier by reducing the difference between the direction of the solution vector in each subset and that in the entire dataset. The proposed learning framework is called globalized multiple balanced subsets with collaborative learning (GMBSCL). The experimental results validate the effectiveness of the proposed GMBSCL.
引用
收藏
页码:2407 / 2417
页数:11
相关论文
共 44 条
[1]  
[Anonymous], 2022, IEEE T PATTERN ANAL, DOI [DOI 10.1109/TPAMI.2019.2932058, DOI 10.1109/CVPR46437.2021.01135]
[2]   Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets [J].
Bao, Lei ;
Juan, Cao ;
Li, Jintao ;
Zhang, Yongdong .
NEUROCOMPUTING, 2016, 172 :198-206
[3]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[4]  
Benavoli A, 2017, J MACH LEARN RES, V18
[5]   Neighbourhood sampling in bagging for imbalanced data [J].
Blaszczynski, Jerzy ;
Stefanowski, Jerzy .
NEUROCOMPUTING, 2015, 150 :529-542
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]  
Duda R, 2000, PATTERN CLASSIFICATI
[8]   Entropy-based fuzzy support vector machine for imbalanced datasets [J].
Fan, Qi ;
Wang, Zhe ;
Li, Dongdong ;
Gao, Daqi ;
Zha, Hongyuan .
KNOWLEDGE-BASED SYSTEMS, 2017, 115 :87-99
[9]   A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484
[10]   Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study [J].
Garcia, Salvador ;
Derrac, Joaquin ;
Ramon Cano, Jose ;
Herrera, Francisco .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (03) :417-435