Synthetic minority oversampling technique for multiclass imbalance problems

被引:174
作者
Zhu, Tuanfei [1 ]
Lin, Yaping [1 ]
Liu, Yonghe [2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, 116 Lu Shan South Rd, Changsha, Hunan, Peoples R China
[2] Univ Texas Arlington, Dept Comp Sci & Engn, 701 S Nedderman Dr, Arlington, TX 76019 USA
基金
中国国家自然科学基金;
关键词
Multiclass imbalance problems; Synthetic minority oversampling; Over generalization; Neighbor directions; ORDINAL REGRESSION; SAMPLING APPROACH; CLASSIFICATION;
D O I
10.1016/j.patcog.2017.07.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiclass imbalance data learning has attracted increasing interests from the research community. Unfortunately, existing oversampling solutions, when facing this more challenging problem as compared to two-class imbalance case, have shown their respective deficiencies such as causing serious over generalization or not actively improving the class imbalance in data space. We propose a k-nearest neighbors (k-NN)-based synthetic minority oversampling algorithm, termed SMOM, to handle multiclass imbalance problems. Different from previous k-NN -based oversampling algorithms, where for any original minority instance the synthetic instances are randomly generated in the directions of its k-nearest neighbors, SMOM assigns a selection weight to each neighbor direction. The neighbor directions that can produce serious over generalization will be given small selection weights. This way, SMOM forms a mechanism of avoiding over generalization as the safer neighbor directions are more likely to be selected to yield the synthetic instances. Owing to this, SMOM can aggressively explore the regions of minority classes by configuring a high value for parameter k, but do not result in severe over generalization. Extensive experiments using 27 real-world data sets demonkrate the effectiveness of our algorithm. (C) 2017 Published by Elsevier Ltd.
引用
收藏
页码:327 / 340
页数:14
相关论文
共 30 条
[21]   Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification [J].
Lin, Minlong ;
Tang, Ke ;
Yao, Xin .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (04) :647-660
[22]  
Maciejewski T., 2011, Proceedings 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), P104, DOI 10.1109/CIDM.2011.5949434
[23]  
Napierala K, 2012, LECT NOTES COMPUT SC, V7209, P139
[24]   Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord) [J].
Nekooeimehr, Iman ;
Lai-Yuen, Susana K. .
NEUROCOMPUTING, 2016, 218 :51-60
[25]   Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets [J].
Saez, Jose A. ;
Krawczyk, Bartosz ;
Wozniak, Michal .
PATTERN RECOGNITION, 2016, 57 :164-178
[26]   Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications [J].
Sander, J ;
Ester, M ;
Kriegel, HP ;
Xu, XW .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :169-194
[27]   Boosting for learning multiple classes with imbalanced class distribution [J].
Sun, Yanmin ;
Kamel, Mohamed S. ;
Wang, Yang .
ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, :592-602
[28]   Multiclass Imbalance Problems: Analysis and Potential Solutions [J].
Wang, Shuo ;
Yao, Xin .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (04) :1119-1130
[29]  
Weiss G.M., 2004, ACM SIGKDD Explorations Newsletter, V6, DOI DOI 10.1145/1007730.1007734
[30]   COG: local decomposition for rare class analysis [J].
Wu, Junjie ;
Xiong, Hui ;
Chen, Jian .
DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) :191-220