Comparing Threshold Selection Methods for Network Anomaly Detection

被引:5
作者
Komadina, Adrian [1 ]
Martinic, Mislav [2 ]
Gros, Stjepan [1 ]
Mihajlovic, Zeljka [3 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Lab Informat Secur & Privacy, Zagreb 10000, Croatia
[2] CS Comp Syst, Zagreb 10000, Croatia
[3] Univ Zagreb, Fac Elect Engn & Comp, Dept Elect Microelect Comp & Intelligent Syst, Zagreb 10000, Croatia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Anomaly detection; Measurement; Receivers; Machine learning; Intrusion detection; Unsupervised learning; network data; threshold selection; unsupervised learning; OUTLIER DETECTION; TIME; AUTHENTICATION; SYSTEMS;
D O I
10.1109/ACCESS.2024.3452168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of unsupervised machine learning models for anomaly detection is a common thing nowadays. While many research papers focus on improving and testing these models, there is a lack of those that deal with threshold selection, which is an important step in implementing a good anomaly detection system. In this paper, we investigate different supervised and unsupervised threshold selection methods found in the network anomaly detection literature. A total of five supervised and twenty unsupervised methods were found, all of which are described, categorized, and implemented in this paper. The unsupervised methods were further categorized according to the input data they expect, the type of output data they produce, and whether they are parametric or not, and divided into six groups according to the idea behind these methods: Statistics-based, Distribution-based, Clustering-based, Density-based, Graphical-based methods and Other. To test all the methods found, two different testing scenarios are created. The first one focuses on using data with anomalies and the second one uses only the normal data. Based on these two scenarios, tests were performed with real firewall log data containing three types of injected anomalies. The results are presented in the form of boxplots of the Matthews correlation coefficient for nine datasets. To draw a conclusion, both the method groups and the individual methods were compared in terms of evaluation metrics and execution times as well as in comparison to the methods already implemented in the PyThresh toolkit.
引用
收藏
页码:124943 / 124973
页数:31
相关论文
共 112 条
[71]   COMPARISON OF PREDICTED AND OBSERVED SECONDARY STRUCTURE OF T4 PHAGE LYSOZYME [J].
MATTHEWS, BW .
BIOCHIMICA ET BIOPHYSICA ACTA, 1975, 405 (02) :442-451
[72]  
Montgomery DC, 2005, INTRO STAT QUALITY C
[73]   High-dimensional Bayesian optimization using low-dimensional feature spaces [J].
Moriconi, Riccardo ;
Deisenroth, Marc Peter ;
Sesh Kumar, K. S. .
MACHINE LEARNING, 2020, 109 (9-10) :1925-1943
[74]   NETWORK INTRUSION DETECTION [J].
MUKHERJEE, B ;
HEBERLEIN, LT ;
LEVITT, KN .
IEEE NETWORK, 1994, 8 (03) :26-41
[75]  
N A, 2015, Arxiv, DOI arXiv:1507.01685
[76]   Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities [J].
Olukanmi, Peter ;
Nelwamondo, Fulufhelo ;
Marwala, Tshilidzi ;
Twala, Bhekisipho .
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08) :5939-5958
[77]   THRESHOLD SELECTION METHOD FROM GRAY-LEVEL HISTOGRAMS [J].
OTSU, N .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (01) :62-66
[78]   An overview of anomaly detection techniques: Existing solutions and latest technological trends [J].
Patcha, Animesh ;
Park, Jung-Min .
COMPUTER NETWORKS, 2007, 51 (12) :3448-3470
[79]  
Patel KMA, 2016, 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, P2042, DOI 10.1109/ICCSP.2016.7754534
[80]  
PICKANDS J, 1975, ANN STAT, V3, P119