Machine Learning in Network Intrusion Detection: A Cross-Dataset Generalization Study

被引:1
作者
Cantone, Marco [1 ]
Marrocco, Claudio [1 ]
Bria, Alessandro [1 ]
机构
[1] Univ Cassino & Southern Latium, Dept Elect & Informat Engn, I-03043 Cassino, Italy
关键词
Feature extraction; Training; Machine learning; Data models; Labeling; Federated learning; Network intrusion detection; Knowledge based systems; Bayes methods; Telecommunication traffic; CIC-IDS2017; cross-dataset; CSE-CIC-IDS2018; generalization; intrusion detection system; machine learning;
D O I
10.1109/ACCESS.2024.3472907
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity. Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications. In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework. We employ four machine learning classifiers and utilize four datasets acquired from different networks: CIC-IDS-2017, CSE-CIC-IDS2018, LycoS-IDS2017, and LycoS-Unicas-IDS2018. Notably, the last dataset is a novel contribution, where we apply corrections based on LycoS-IDS2017 to the well-known CSE-CIC-IDS2018 dataset. The results show nearly perfect classification performance when the models are trained and tested on the same dataset. However, when training and testing the models in a cross-dataset fashion, the classification accuracy is largely commensurate with random chance except for a few combinations of attacks and datasets. We employ data visualization techniques in order to provide valuable insights on the patterns in the data. Our analysis unveils the presence of anomalies in the data that directly hinder the classifiers capability to generalize the learned knowledge to new scenarios. This study enhances our comprehension of the generalization capabilities of machine-learning-based NIDS, highlighting the significance of acknowledging data heterogeneity.
引用
收藏
页码:144489 / 144508
页数:20
相关论文
共 69 条
[1]   Image-based Neural Network Models for Malware Traffic Classification using PCAP to Picture Conversion [J].
Agrafiotis, Giorgos ;
Makri, Eftychia ;
Flionis, Ioannis ;
Lalas, Antonios ;
Votis, Konstantinos ;
Tzovaras, Dimitrios .
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
[2]   Review Federated Learning for intrusion detection system: Concepts, challenges and future directions [J].
Agrawal, Shaashwat ;
Sarkar, Sagnik ;
Aouedi, Ons ;
Yenduri, Gokul ;
Piamrat, Kandaraj ;
Alazab, Mamoun ;
Bhattacharya, Sweta ;
Maddikunta, Praveen Kumar Reddy ;
Gadekallu, Thippa Reddy .
COMPUTER COMMUNICATIONS, 2022, 195 :346-361
[3]   Intrusion Detection Systems: A State-of-the-Art Taxonomy and Survey [J].
Alkasassbeh, Mouhammd ;
Baddar, Sherenaz Al-Haj .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) :10021-10064
[4]  
Annarasi RS, 2014, 2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), P1174, DOI 10.1109/ICACCCT.2014.7019283
[5]   Generalizing intrusion detection for heterogeneous networks: A stacke d-unsupervise d fe derate d learning approach [J].
Bertoli, Gustavo de Carvalho ;
Pereira Junior, Lourenco Alves ;
Saotome, Osamu ;
dos Santos, Aldri Luiz .
COMPUTERS & SECURITY, 2023, 127
[6]   An improved ensemble based intrusion detection technique usingXGBoost [J].
Bhati, Bhoopesh Singh ;
Chugh, Garvit ;
Al-Turjman, Fadi ;
Bhati, Nitesh Singh .
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (06)
[7]   Network Anomaly Detection: Methods, Systems and Tools [J].
Bhuyan, Monowar H. ;
Bhattacharyya, D. K. ;
Kalita, J. K. .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2014, 16 (01) :303-336
[8]   Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric [J].
Boughorbel, Sabri ;
Jarray, Fethi ;
El-Anbari, Mohammed .
PLOS ONE, 2017, 12 (06)
[9]   PCA-based multivariate statistical network monitoring for anomaly detection [J].
Camacho, Jose ;
Perez-Villegas, Alejandro ;
Garcia-Teodoro, Pedro ;
Macia-Fernandez, Gabriel .
COMPUTERS & SECURITY, 2016, 59 :118-137
[10]   Machine Learning on Public Intrusion Datasets: Academic Hype or Concrete Advances in NIDS? [J].
Catillo, Marta ;
Pecchia, Antonio ;
Villano, Umberto .
2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOLUME, DSN-S, 2023, :132-136