NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems

被引:67
作者
Wei, Jianan [1 ]
Huang, Haisong [1 ]
Yao, Liguo [1 ,2 ]
Hu, Yao [1 ,3 ]
Fan, Qingsong [1 ]
Huang, Dong [1 ]
机构
[1] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China
[2] Yuan Ze Univ, Dept Ind Engn & Management, Taoyuan 32003, Taiwan
[3] Guizhou Renhe Zhiyuan Data Serv Co Ltd, Guiyang 50025, Guizhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced classification; Noise-immunity; MWMOTE; Clustering; Oversampling; PERFORMANCE; PREDICTION;
D O I
10.1016/j.eswa.2020.113504
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Oversampling techniques have been favored by researchers because of their simplicity and versatility in dealing with imbalanced classification problems. For oversampling techniques appeared in recent years (e.g. Majority Weighted Minority Oversampling Technique (MWMOTE)), noise processing plays an important role. This is because the processing of noise directly affects the distribution of new synthetic instances. MWMOTE and many other oversampling techniques use knn based noise processing method. While the knn method can effectively handle partial noise when the neighborhood parameter k value is reasonable, it may lead to under-recognition or over-recognition without prior experience. Therefore, we propose an improving noise-immunity majority weighted minority oversampling technique abbreviated NI-MWMOTE. NI-MWMOTE uses an adaptive noise processing scheme, which combines Euclidean distance and neighbor density to rank the probability that suspected noise (knn method) is true noise, and then adaptively selects the best noise processing scheme through iteration and misclassification error. Then, aggregative hierarchical clustering (AHC) method is used to cluster minority instances. And, in each sub-cluster, the sampling size of new samples is adaptively determined by classification complexity and cross-validation. NI-MWMOTE not only avoids the generation of new noise, but also effectively overcomes both between-class imbalances and within-class imbalances. Results demonstrate that NI-MWMOTE achieves significantly better results in most imbalanced datasets than eight popular oversampling algorithms. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:22
相关论文
共 45 条
[1]   A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets [J].
Alshomrani, Saleh ;
Bawakid, Abdullah ;
Shim, Seong-O ;
Fernandez, Alberto ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2015, 73 :1-17
[2]  
[Anonymous], 2013, UCI MACHINE LEARNING
[3]  
[Anonymous], INT C ADV INT COMP
[4]  
[Anonymous], IEEE INT C GRAN COMP
[5]  
[Anonymous], 2008, IEEE INT JOINT C NEU
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista G. E. A. P. A., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[8]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[9]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[10]  
C Bunkhumpornpat, 2009, PAC AS C ADV KNOWL D