Machine Learning in Network Intrusion Detection: A Cross-Dataset Generalization Study

被引:1
作者
Cantone, Marco [1 ]
Marrocco, Claudio [1 ]
Bria, Alessandro [1 ]
机构
[1] Univ Cassino & Southern Latium, Dept Elect & Informat Engn, I-03043 Cassino, Italy
关键词
Feature extraction; Training; Machine learning; Data models; Labeling; Federated learning; Network intrusion detection; Knowledge based systems; Bayes methods; Telecommunication traffic; CIC-IDS2017; cross-dataset; CSE-CIC-IDS2018; generalization; intrusion detection system; machine learning;
D O I
10.1109/ACCESS.2024.3472907
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity. Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications. In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework. We employ four machine learning classifiers and utilize four datasets acquired from different networks: CIC-IDS-2017, CSE-CIC-IDS2018, LycoS-IDS2017, and LycoS-Unicas-IDS2018. Notably, the last dataset is a novel contribution, where we apply corrections based on LycoS-IDS2017 to the well-known CSE-CIC-IDS2018 dataset. The results show nearly perfect classification performance when the models are trained and tested on the same dataset. However, when training and testing the models in a cross-dataset fashion, the classification accuracy is largely commensurate with random chance except for a few combinations of attacks and datasets. We employ data visualization techniques in order to provide valuable insights on the patterns in the data. Our analysis unveils the presence of anomalies in the data that directly hinder the classifiers capability to generalize the learned knowledge to new scenarios. This study enhances our comprehension of the generalization capabilities of machine-learning-based NIDS, highlighting the significance of acknowledging data heterogeneity.
引用
收藏
页码:144489 / 144508
页数:20
相关论文
共 69 条
[51]   Cyber Threat Intelligence Sharing Scheme Based on Federated Learning for Network Intrusion Detection [J].
Sarhan, Mohanad ;
Layeghy, Siamak ;
Moustafa, Nour ;
Portmann, Marius .
JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2023, 31 (01)
[52]   Towards a Standard Feature Set for Network Intrusion Detection System Datasets [J].
Sarhan, Mohanad ;
Layeghy, Siamak ;
Portmann, Marius .
MOBILE NETWORKS & APPLICATIONS, 2022, 27 (01) :357-370
[53]   Cybersecurity data science: an overview from machine learning perspective [J].
Sarker, Iqbal H. ;
Kayes, A. S. M. ;
Badsha, Shahriar ;
Alqahtani, Hamed ;
Watters, Paul ;
Ng, Alex .
JOURNAL OF BIG DATA, 2020, 7 (01)
[54]  
Sharafaldin I, 2019, INT CARN CONF SECU, DOI 10.1109/ccst.2019.8888419
[55]   Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization [J].
Sharafaldin, Iman ;
Lashkari, Arash Habibi ;
Ghorbani, Ali A. .
ICISSP: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2018, :108-116
[56]   A Deep Learning Approach to Network Intrusion Detection [J].
Shone, Nathan ;
Tran Nguyen Ngoc ;
Vu Dinh Phai ;
Shi, Qi .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (01) :41-50
[57]  
Sic ic I., 2023, P 17 INT C TEL CONTE, P1
[58]  
Stiawan D., 2020, IEEE ACCESS, V8, P132911, DOI DOI 10.1109/ACCESS.2020.3009843
[59]   A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis [J].
Tan, Zhiyuan ;
Jamdagni, Aruna ;
He, Xiangjian ;
Nanda, Priyadarsi ;
Liu, Ren Ping .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (02) :447-456
[60]  
Ustebay S, 2018, 2018 INTERNATIONAL CONGRESS ON BIG DATA, DEEP LEARNING AND FIGHTING CYBER TERRORISM (IBIGDELFT), P71, DOI 10.1109/IBIGDELFT.2018.8625318