Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm

被引:130
作者
Jeatrakul, Piyasak [1 ]
Wong, Kok Wai [1 ]
Fung, Chun Che [1 ]
机构
[1] Murdoch Univ, Sch Informat Technol, Murdoch, WA 6150, Australia
来源
NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II | 2010年 / 6444卷
关键词
Class imbalanced problem; artificial neural network; complementary neural network; classification; misclassification analysis;
D O I
10.1007/978-3-642-17534-3_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem.
引用
收藏
页码:152 / 159
页数:8
相关论文
共 13 条
[1]  
[Anonymous], 2007, Uci machine learning repository
[2]   Strategies for learning in class imbalance problems [J].
Barandela, R ;
Sánchez, JS ;
García, V ;
Rangel, E .
PATTERN RECOGNITION, 2003, 36 (03) :849-851
[3]  
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Gedeon T.D., 1992, INT JOINT C NEURAL N, V2, P449
[6]  
Gedeon TD, 1995, LECT NOTES COMPUT SC, V930, P551
[7]   Data Mining on Imbalanced Data Sets [J].
Gu, Qiong ;
Cai, Zhihua ;
Zhu, Li ;
Huang, Bo .
2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, :1020-1024
[8]   Data Cleaning for Classification Using Misclassification Analysis [J].
Jeatrakul, Piyasak ;
Wong, Kok Wai ;
Fung, Chun Che .
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2010, 14 (03) :297-302
[9]  
Kraipeerapun P, 2009, LECT NOTES COMPUT SC, V5551, P175, DOI 10.1007/978-3-642-01507-6_21
[10]   Binary classification using ensemble neural networks and interval neutrosophic sets [J].
Kraipeerapun, Pawalai ;
Fung, Chun Che .
NEUROCOMPUTING, 2009, 72 (13-15) :2845-2856