Imbalanced classification;
Least squares support numerical spectrum;
Minority samples weights;
Oversampling;
k* information nearest neighbors;
SMOTE;
MACHINE;
D O I:
10.1016/j.knosys.2020.106116
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
As the essence of machine learning, classification is widely used in real life, however, imbalanced data has brought great challenges to classification problems. This is because standard classifiers tend to favor the majority instances and ignore the minority instances. The new oversampling algorithms (e.g. A-SUWO) based on the improving majority weighted minority oversampling (IMWMO) method assign weights through the Euclidean distances from majority instances to hard-to-learn minority instances, and then guide the synthesis of minority samples according to the weights to address the offset of the classification hyperplanes. A-SUWO has achieved better results than traditional oversampling algorithms (e.g. SMOTE and MWMOTE, etc.), when its parameters are well adjusted. However, A-SUWO may give minority training samples inappropriate weights in some irregularly distributed scenarios and make learning tasks even more harder. Additionally, A-SUWO's knn synthesizing method may not obtain wider and more effective instances. Therefore, we propose an improving adaptive semi-unsupervised weighted oversampling (IA-SUWO) technique to address the imbalanced classification problems more effectively. The improvement of IA-SUWO mainly focuses on the following two aspects: (1) comprehensively considering the least squares support numerical spectrum values and the IMWMO method to assign weights to minority instances, and (2) synthesizing new instances using the k* information nearest neighbors (k*INN) method. IA-SUWO aims to maximize the probability that all important minority samples will be drawn and generates more efficient (more scattered) boundary instances. Results demonstrate that IA-SUWO achieves significantly better results in most datasets compared with other 10 oversampling algorithms and 2 ensemble algorithms. (C) 2020 Elsevier B.V. All rights reserved.
机构:
Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Pahat, MalaysiaUniv Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Pahat, Malaysia
Ali, Haseeb
Salleh, Mohd Najib Mohd
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Pahat, MalaysiaUniv Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Pahat, Malaysia
Salleh, Mohd Najib Mohd
Hussain, Kashif
论文数: 0引用数: 0
h-index: 0
机构:
Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu, Sichuan, Peoples R ChinaUniv Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Batu Pahat, Malaysia
机构:
Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R ChinaShenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
Long, Hao
He, Yulin
论文数: 0引用数: 0
h-index: 0
机构:
Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R ChinaShenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
He, Yulin
Huang, Joshua Zhexue
论文数: 0引用数: 0
h-index: 0
机构:
Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R ChinaShenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
Huang, Joshua Zhexue
Wang, Qiang
论文数: 0引用数: 0
h-index: 0
机构:
Shenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R ChinaShenzhen Univ, Coll Comp Sci & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
Wang, Qiang
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2017,
2017,
10526
: 116
-
128