Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset

被引:132
作者
Karatas, Gozde [1 ]
Demir, Onder [2 ]
Sahingoz, Ozgur Koray [3 ]
机构
[1] Istanbul Kultur Univ, Fac Sci & Literature, Dept Math & Comp Sci, TR-34158 Istanbul, Turkey
[2] Marmara Univ, Fac Technol, Dept Comp Engn, TR-34722 Istanbul, Turkey
[3] Istanbul Kultur Univ, Dept Comp Engn, Fac Engn, TR-34158 Istanbul, Turkey
关键词
IDS; intrusion detection; SMOTE; machine learning; CSE-CIC-IDS2018; imbalanced dataset; INTRUSION DETECTION;
D O I
10.1109/ACCESS.2020.2973219
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, due to the extensive use of the Internet, the number of networked computers has been increasing in our daily lives. Weaknesses of the servers enable hackers to intrude on computers by using not only known but also new attack-types, which are more sophisticated and harder to detect. To protect the computers from them, Intrusion Detection System (IDS), which is trained with some machine learning techniques by using a pre-collected dataset, is one of the most preferred protection mechanisms. The used datasets were collected during a limited period in some specific networks and generally don & x2019;t contain up-to-date data. Additionally, they are imbalanced and cannot hold sufficient data for all types of attacks. These imbalanced and outdated datasets decrease the efficiency of current IDSs, especially for rarely encountered attack types. In this paper, we propose six machine-learning-based IDSs by using K Nearest Neighbor, Random Forest, Gradient Boosting, Adaboost, Decision Tree, and Linear Discriminant Analysis algorithms. To implement a more realistic IDS, an up-to-date security dataset, CSE-CIC-IDS2018, is used instead of older and mostly worked datasets. The selected dataset is also imbalanced. Therefore, to increase the efficiency of the system depending on attack types and to decrease missed intrusions and false alarms, the imbalance ratio is reduced by using a synthetic data generation model called Synthetic Minority Oversampling TEchnique (SMOTE). Data generation is performed for minor classes, and their numbers are increased to the average data size via this technique. Experimental results demonstrated that the proposed approach considerably increases the detection rate for rarely encountered intrusions.
引用
收藏
页码:32150 / 32162
页数:13
相关论文
共 45 条
  • [11] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [12] Beer F., 2017, P 10 DFN FOR KOMM BE, P11
  • [13] Random forest in remote sensing: A review of applications and future directions
    Belgiu, Mariana
    Dragut, Lucian
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 114 : 24 - 31
  • [14] Chandra A, 2019, PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), P740
  • [15] Frosst N., 2017, ARXIV171109784
  • [16] An Adaptive Ensemble Machine Learning Model for Intrusion Detection
    Gao, Xianwei
    Shan, Chun
    Hu, Changzhen
    Niu, Zequn
    Liu, Zhen
    [J]. IEEE ACCESS, 2019, 7 : 82512 - 82521
  • [17] Gharib A., 2016, P 2016 INT C INF SCI, P1, DOI 10.1109/ICISSEC.2016.7885840
  • [18] Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling
    Haider, W.
    Hu, J.
    Slay, J.
    Turnbull, B. P.
    Xie, Y.
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 87 : 185 - 192
  • [19] Intergovernmental Panel Climate Change Working Grp III, 2014, CLIMATE CHANGE 2014: MITIGATION OF CLIMATE CHANGE, P1
  • [20] Survey on deep learning with class imbalance
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)