Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection

被引:43
作者
Seo, Jae-Hyun [1 ]
Kim, Yong-Hyuk [2 ]
机构
[1] Wonkwang Univ, Dept Comp Sci & Engn, 460 Iksandae Ro, Iksan Si 54649, Jeonbuk, South Korea
[2] Kwangwoon Univ, Sch Software, 20 Kwangwoon Ro, Seoul 01897, South Korea
关键词
Data mining - Machine learning - Probes - Denial-of-service attack - Statistical tests;
D O I
10.1155/2018/9704672
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (R2L), denial of service (DoS), and Probe. We use five classes by adding the normal class. We define the U2R, R2L, and Probe classes, which arc each less than 1% of the total dataset, as rare classes. In this study, we attempt to mitigate the class imbalance of the dataset. Using the synthetic minority oversampling technique (SMOTE), we attempted to optimize the SMOTE ratios for the rare classes (U2R, R2L, and Probe). After randomly generating a number of tuples of SMOTE ratios, these tuples were used to create a numerical model for optimizing the SMOTE ratios of the rare classes. The support vector regression was used to create the model. We assigned each instance in the test dataset to the model and chose the best SMOTE ratios. The experiments using machine-learning techniques were conducted using the best ratios. The results using the proposed method were significantly better than those of previous approach and other related work.
引用
收藏
页数:11
相关论文
共 29 条
[1]  
Abdiansah A., 2015, International Journal of Computer Applications, V128, P28, DOI DOI 10.5120/IJCA2015906480
[2]  
[Anonymous], 2003, C45 CLASS IMBALANCE
[3]  
[Anonymous], 2005, P 28 AUSTR CS C
[4]  
[Anonymous], 2018, INTRUSION DETECTION
[5]  
[Anonymous], 2000, ACM SIGKDD EXPLORATI
[6]  
[Anonymous], 2015, South African Computer Journal, DOI DOI 10.18489/SACJ.V56I1.248
[7]   Handling class imbalance in customer churn prediction [J].
Burez, J. ;
Van den Poel, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4626-4636
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[10]   Data preprocessing for anomaly based network intrusion detection: A review [J].
Davis, Jonathan J. ;
Clark, Andrew J. .
COMPUTERS & SECURITY, 2011, 30 (6-7) :353-375