A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE

被引:42
作者
Hu, Feng [1 ]
Li, Hang [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Computat Intelligence, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
CLASS IMBALANCE; CLASSIFICATION;
D O I
10.1155/2013/694809
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Rough set theory is a powerful mathematical tool introduced by Pawlak to deal with imprecise, uncertain, and vague information. The Neighborhood-Based Rough Set Model expands the rough set theory; it could divide the dataset into three parts. And the boundary region indicates that the majority class samples and the minority class samples are overlapped. On the basis of what we know about the distribution of original dataset, we only oversample the minority class samples, which are overlapped with the majority class samples, in the boundary region. So, the NRSBoundary-SMOTE can expand the decision space for the minority class; meanwhile, it will shrink the decision space for the majority class. After conducting an experiment on four kinds of classifiers, NRSBoundary-SMOTE has higher accuracy than other methods when C4.5, CART, and KNN are used but it is worse than SMOTE on classifier SVM.
引用
收藏
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 2010, WIKIPEDIA WEKA MACHI
[2]  
[Anonymous], 2007, ACTA ELECT SIN
[3]  
Blake C. L., 1998, Uci repository of machine learning databases
[4]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]  
Dong YJ, 2011, LECT NOTES ARTIF INT, V7091, P343, DOI 10.1007/978-3-642-25975-3_30
[7]   A comparison of two approaches to data mining from imbalanced data [J].
Grzymala-Busse, JW ;
Stefanowski, J ;
Wilk, S .
JOURNAL OF INTELLIGENT MANUFACTURING, 2005, 16 (06) :565-573
[8]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[9]   Neighborhood classifiers [J].
Hu, Qinghua ;
Yu, Daren ;
Me, Zongxia .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) :866-876
[10]   Using AUC and accuracy in evaluating learning algorithms [J].
Huang, J ;
Ling, CX .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (03) :299-310