Improving Intrusion Detection Through Training Data Augmentation

被引:5
作者
Otokwala, Uneneibotejit [1 ]
Petrovski, Andrei [1 ]
Kalutarage, Harsha [1 ]
机构
[1] Robert Gordon Univ, Sch Comp, Aberdeen, Scotland
来源
2021 14TH INTERNATIONAL CONFERENCE ON SECURITY OF INFORMATION AND NETWORKS (SIN 2021) | 2021年
关键词
Imbalanced data; Minority oversampling; Data augmentation; Intrusion detection;
D O I
10.1109/SIN54109.2021.9699293
中图分类号
学科分类号
摘要
Imbalanced classes in datasets are common problems often found in security data. Therefore, several strategies like class resampling and cost-sensitive training have been proposed to address it. In this paper, we propose a data augmentation strategy to oversample the minority classes in the dataset. Using our Sort-Augment-Combine (SAC) technique, we split the dataset into subsets of the class labels and then generate synthetic data from each of the subsets. The synthetic data were then used to oversample the minority classes. Upon the completion of the oversampling, the independent classes were combined to form an augmented training data for model fitting. Using performance metrics such as accuracy, recall (sensitivity) and true positives (specificity), the models trained using the augmented datasets show an improvement in performance metrics over the original dataset. Similarly, in a binary class dataset, SAC performed optimally and the combination of SAC and ROSE model shows an improvement in overall accuracy, sensitivity and specificity when compared with the performance of the Random Forest model on the original dataset, ROSE and SMOTE augmented datasets.
引用
收藏
页数:8
相关论文
共 34 条
[1]   Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric [J].
Boughorbel, Sabri ;
Jarray, Fethi ;
El-Anbari, Mohammed .
PLOS ONE, 2017, 12 (06)
[2]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[3]   RAMOBoost: Ranked Minority Oversampling in Boosting [J].
Chen, Sheng ;
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (10) :1624-1642
[4]  
Domingos P., 1999, P ACM SIGKDD INT C K, P155, DOI [DOI 10.1145/312129.312220, 10.1145/312129.312220]
[5]  
Eke Hope, 2020, International Journal of Systems and Software Security and Protection, V11, P13, DOI 10.4018/IJSSSP.2020070102
[6]   Data Augmentation for Low-Resource Neural Machine Translation [J].
Fadaee, Marzieh ;
Bisazza, Arianna ;
Monz, Christof .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, :567-573
[7]  
Garcia F. C., 2016, ARXIV PREPRINT ARXIV
[8]   Ensemble of online neural networks for non-stationary and imbalanced data streams [J].
Ghazikhani, Adel ;
Monsefi, Reza ;
Yazdi, Hadi Sadoghi .
NEUROCOMPUTING, 2013, 122 :535-544
[9]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[10]   A kernel-based two-class classifier for imbalanced data sets [J].
Hong, Xia ;
Chen, Sheng ;
Harris, Chris J. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (01) :28-41