Evaluating ML-based anomaly detection across datasets of varied integrity: A case study

被引:1
作者
Pekar, Adrian [1 ,2 ]
Jozsa, Richard [1 ]
机构
[1] Budapest Univ Technol & Econ, Muegyetem Rkp 3, H-1111 Budapest, Hungary
[2] HUN REN BME Informat Syst Res Grp, Magyar Tudosok Krt 2, H-1117 Budapest, Hungary
关键词
CICIDS-2017; CICFlowMeter; NFStream; Random forest; Network traffic flow; Anomaly detection; Cybersecurity;
D O I
10.1016/j.comnet.2024.110617
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cybersecurity remains a critical challenge in the digital age, with network traffic flow anomaly detection being a key pivotal instrument in the fight against cyber threats. In this study, we address the prevalent issue of data integrity in network traffic datasets, which are instrumental in developing machine learning (ML) models for anomaly detection. We introduce two refined versions of the CICIDS-2017 dataset, NFS-2023-nTE and NFS-2023-TE, processed using NFStream to ensure methodologically sound flow expiration and labeling. Our research contrasts the performance of the Random Forest (RF) algorithm across the original CICIDS-2017, its refined counterparts WTMC-2021 and CRiSIS-2022, and our NFStream-generated datasets, in both binary and multi-class classification contexts. We observe that the RF model exhibits exceptional robustness, achieving consistent high-performance metrics irrespective of the underlying dataset quality, which prompts a critical discussion on the actual impact of data integrity on ML efficacy. Our study underscores the importance of continual refinement and methodological rigor in dataset generation for network security research. As the landscape of network threats evolves, so must the tools and techniques used to detect and analyze them.
引用
收藏
页数:18
相关论文
共 19 条
[1]   NFStream A flexible network data analysis framework [J].
Aouini, Zied ;
Pekar, Adrian .
COMPUTER NETWORKS, 2022, 204
[2]  
Canadian Institute for Cybersecurity, 2024, Datasets
[3]   LITNET-2020: An Annotated Real-World Network Flow Dataset for Network Intrusion Detection [J].
Damasevicius, Robertas ;
Venckauskas, Algimantas ;
Grigaliunas, Sarunas ;
Toldinas, Jevgenijus ;
Morkevicius, Nerijus ;
Aleliunas, Tautvydas ;
Smuikys, Paulius .
ELECTRONICS, 2020, 9 (05)
[4]  
Draper-Gil Gerard, 2016, ICISSP 2016. 2nd International Conference on Information Systems Security and Privacy. Proceedings, P407
[5]   Troubleshooting an Intrusion Detection Dataset: the CICIDS2017 Case Study [J].
Engelen, Gints ;
Rimmer, Vera ;
Joosen, Wouter .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2021), 2021, :7-12
[6]   Generating Network Intrusion Detection Dataset Based on Real and Encrypted Synthetic Attack Traffic [J].
Ferriyan, Andrey ;
Thamrin, Achmad Husni ;
Takeda, Keiji ;
Murai, Jun .
APPLIED SCIENCES-BASEL, 2021, 11 (17)
[7]  
FlowFrontiers, 2023, NFS-2023-nTE dataset
[8]  
FlowFrontiers, 2023, NFS-2023-TE dataset
[9]  
FlowFrontiers, 2023, Evaluating ML-Based anomaly detection across datasets of varied quality: A case study
[10]   Extremely randomized trees [J].
Geurts, P ;
Ernst, D ;
Wehenkel, L .
MACHINE LEARNING, 2006, 63 (01) :3-42