IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems

被引:28
作者
Wei Jianan [1 ]
Huang Haisong [1 ]
Yao Liguo [1 ,2 ]
Hu Yao [1 ,3 ]
Fan Qingsong [1 ]
Huang Dong [1 ]
机构
[1] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China
[2] Yuan Ze Univ, Dept Ind Engn & Management, Taoyuan 32003, Taiwan
[3] Guizhou Renhe Zhiyuan Data Serv Co Ltd, Guiyang 550025, Guizhou, Peoples R China
关键词
Imbalanced classification; Least squares support numerical spectrum; Minority samples weights; Oversampling; k* information nearest neighbors; SMOTE; MACHINE;
D O I
10.1016/j.knosys.2020.106116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the essence of machine learning, classification is widely used in real life, however, imbalanced data has brought great challenges to classification problems. This is because standard classifiers tend to favor the majority instances and ignore the minority instances. The new oversampling algorithms (e.g. A-SUWO) based on the improving majority weighted minority oversampling (IMWMO) method assign weights through the Euclidean distances from majority instances to hard-to-learn minority instances, and then guide the synthesis of minority samples according to the weights to address the offset of the classification hyperplanes. A-SUWO has achieved better results than traditional oversampling algorithms (e.g. SMOTE and MWMOTE, etc.), when its parameters are well adjusted. However, A-SUWO may give minority training samples inappropriate weights in some irregularly distributed scenarios and make learning tasks even more harder. Additionally, A-SUWO's knn synthesizing method may not obtain wider and more effective instances. Therefore, we propose an improving adaptive semi-unsupervised weighted oversampling (IA-SUWO) technique to address the imbalanced classification problems more effectively. The improvement of IA-SUWO mainly focuses on the following two aspects: (1) comprehensively considering the least squares support numerical spectrum values and the IMWMO method to assign weights to minority instances, and (2) synthesizing new instances using the k* information nearest neighbors (k*INN) method. IA-SUWO aims to maximize the probability that all important minority samples will be drawn and generates more efficient (more scattered) boundary instances. Results demonstrate that IA-SUWO achieves significantly better results in most datasets compared with other 10 oversampling algorithms and 2 ensemble algorithms. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 44 条
[1]   A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets [J].
Alshomrani, Saleh ;
Bawakid, Abdullah ;
Shim, Seong-O ;
Fernandez, Alberto ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2015, 73 :1-17
[2]  
[Anonymous], 2006, GESTS International Transactions on Computer Science and Engineering
[3]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[4]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[5]   An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme [J].
Bi, Jingjun ;
Zhang, Chongsheng .
KNOWLEDGE-BASED SYSTEMS, 2018, 158 :81-93
[6]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[7]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[8]   l2,1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification [J].
Cao, Peng ;
Liu, Xiaoli ;
Zhang, Jian ;
Zhao, Dazhe ;
Huang, Min ;
Zaiane, Osmar .
NEUROCOMPUTING, 2017, 234 :38-57
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]  
Chen S, 2017, THESIS NATL TSING HU, P1