Synthetic minority oversampling technique for multiclass imbalance problems

被引:174
作者
Zhu, Tuanfei [1 ]
Lin, Yaping [1 ]
Liu, Yonghe [2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, 116 Lu Shan South Rd, Changsha, Hunan, Peoples R China
[2] Univ Texas Arlington, Dept Comp Sci & Engn, 701 S Nedderman Dr, Arlington, TX 76019 USA
基金
中国国家自然科学基金;
关键词
Multiclass imbalance problems; Synthetic minority oversampling; Over generalization; Neighbor directions; ORDINAL REGRESSION; SAMPLING APPROACH; CLASSIFICATION;
D O I
10.1016/j.patcog.2017.07.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiclass imbalance data learning has attracted increasing interests from the research community. Unfortunately, existing oversampling solutions, when facing this more challenging problem as compared to two-class imbalance case, have shown their respective deficiencies such as causing serious over generalization or not actively improving the class imbalance in data space. We propose a k-nearest neighbors (k-NN)-based synthetic minority oversampling algorithm, termed SMOM, to handle multiclass imbalance problems. Different from previous k-NN -based oversampling algorithms, where for any original minority instance the synthetic instances are randomly generated in the directions of its k-nearest neighbors, SMOM assigns a selection weight to each neighbor direction. The neighbor directions that can produce serious over generalization will be given small selection weights. This way, SMOM forms a mechanism of avoiding over generalization as the safer neighbor directions are more likely to be selected to yield the synthetic instances. Owing to this, SMOM can aggressively explore the regions of minority classes by configuring a high value for parameter k, but do not result in severe over generalization. Extensive experiments using 27 real-world data sets demonkrate the effectiveness of our algorithm. (C) 2017 Published by Elsevier Ltd.
引用
收藏
页码:327 / 340
页数:14
相关论文
共 30 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]   EFFICIENT CLASSIFICATION FOR MULTICLASS PROBLEMS USING MODULAR NEURAL NETWORKS [J].
ANAND, R ;
MEHROTRA, K ;
MOHAN, CK ;
RANKA, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01) :117-124
[3]  
[Anonymous], 2005, DATA MINING
[4]  
[Anonymous], 2004, P IRIS MACH LEARN WO
[5]  
[Anonymous], 2004, ACM SIGKDD EXPLORATI, DOI DOI 10.1145/1007730.1007737
[6]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[7]  
Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[8]  
Bunkhumpornpat K.S., 2009, PAC AS C KNOWL DISC, P475
[9]   Integrated Oversampling for Imbalanced Time Series Classification [J].
Cao, Hong ;
Li, Xiao-Li ;
Woon, David Yew-Kwong ;
Ng, See-Kiong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (12) :2809-2822
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)