IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems

被引:29
作者
Wei Jianan [1 ]
Huang Haisong [1 ]
Yao Liguo [1 ,2 ]
Hu Yao [1 ,3 ]
Fan Qingsong [1 ]
Huang Dong [1 ]
机构
[1] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China
[2] Yuan Ze Univ, Dept Ind Engn & Management, Taoyuan 32003, Taiwan
[3] Guizhou Renhe Zhiyuan Data Serv Co Ltd, Guiyang 550025, Guizhou, Peoples R China
关键词
Imbalanced classification; Least squares support numerical spectrum; Minority samples weights; Oversampling; k* information nearest neighbors; SMOTE; MACHINE;
D O I
10.1016/j.knosys.2020.106116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the essence of machine learning, classification is widely used in real life, however, imbalanced data has brought great challenges to classification problems. This is because standard classifiers tend to favor the majority instances and ignore the minority instances. The new oversampling algorithms (e.g. A-SUWO) based on the improving majority weighted minority oversampling (IMWMO) method assign weights through the Euclidean distances from majority instances to hard-to-learn minority instances, and then guide the synthesis of minority samples according to the weights to address the offset of the classification hyperplanes. A-SUWO has achieved better results than traditional oversampling algorithms (e.g. SMOTE and MWMOTE, etc.), when its parameters are well adjusted. However, A-SUWO may give minority training samples inappropriate weights in some irregularly distributed scenarios and make learning tasks even more harder. Additionally, A-SUWO's knn synthesizing method may not obtain wider and more effective instances. Therefore, we propose an improving adaptive semi-unsupervised weighted oversampling (IA-SUWO) technique to address the imbalanced classification problems more effectively. The improvement of IA-SUWO mainly focuses on the following two aspects: (1) comprehensively considering the least squares support numerical spectrum values and the IMWMO method to assign weights to minority instances, and (2) synthesizing new instances using the k* information nearest neighbors (k*INN) method. IA-SUWO aims to maximize the probability that all important minority samples will be drawn and generates more efficient (more scattered) boundary instances. Results demonstrate that IA-SUWO achieves significantly better results in most datasets compared with other 10 oversampling algorithms and 2 ensemble algorithms. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 44 条
[41]  
YAN K., 2016, YAN PR TOOLS MATLAB
[42]   Cluster-based under-sampling approaches for imbalanced data distributions [J].
Yen, Show-Jane ;
Lee, Yue-Shi .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5718-5727
[43]   Multi-Imbalance: An open-source software for multi-class imbalance learning [J].
Zhang, Chongsheng ;
Bi, Jingjun ;
Xu, Shixin ;
Ramentol, Enislay ;
Fan, Gaojuan ;
Qiao, Baojun ;
Fujita, Hamido .
KNOWLEDGE-BASED SYSTEMS, 2019, 174 :137-143
[44]   Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy [J].
Zhang, Jianhua ;
Cui, Xiqing ;
Li, Jianrong ;
Wang, Rubin .
COGNITION TECHNOLOGY & WORK, 2017, 19 (04) :633-653