Comparing Threshold Selection Methods for Network Anomaly Detection

被引:5
作者
Komadina, Adrian [1 ]
Martinic, Mislav [2 ]
Gros, Stjepan [1 ]
Mihajlovic, Zeljka [3 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Lab Informat Secur & Privacy, Zagreb 10000, Croatia
[2] CS Comp Syst, Zagreb 10000, Croatia
[3] Univ Zagreb, Fac Elect Engn & Comp, Dept Elect Microelect Comp & Intelligent Syst, Zagreb 10000, Croatia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Anomaly detection; Measurement; Receivers; Machine learning; Intrusion detection; Unsupervised learning; network data; threshold selection; unsupervised learning; OUTLIER DETECTION; TIME; AUTHENTICATION; SYSTEMS;
D O I
10.1109/ACCESS.2024.3452168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of unsupervised machine learning models for anomaly detection is a common thing nowadays. While many research papers focus on improving and testing these models, there is a lack of those that deal with threshold selection, which is an important step in implementing a good anomaly detection system. In this paper, we investigate different supervised and unsupervised threshold selection methods found in the network anomaly detection literature. A total of five supervised and twenty unsupervised methods were found, all of which are described, categorized, and implemented in this paper. The unsupervised methods were further categorized according to the input data they expect, the type of output data they produce, and whether they are parametric or not, and divided into six groups according to the idea behind these methods: Statistics-based, Distribution-based, Clustering-based, Density-based, Graphical-based methods and Other. To test all the methods found, two different testing scenarios are created. The first one focuses on using data with anomalies and the second one uses only the normal data. Based on these two scenarios, tests were performed with real firewall log data containing three types of injected anomalies. The results are presented in the form of boxplots of the Matthews correlation coefficient for nine datasets. To draw a conclusion, both the method groups and the individual methods were compared in terms of evaluation metrics and execution times as well as in comparison to the methods already implemented in the PyThresh toolkit.
引用
收藏
页码:124943 / 124973
页数:31
相关论文
共 112 条
[1]  
Aggarwal C. C., 2017, Outlier Analysis, V2nd, P65, DOI [10.1007/978-3-319-47578-3_3, DOI 10.1007/978-3-319-47578-3_3]
[2]  
Alrawashdeh K, 2016, 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), P195, DOI [10.1109/ICMLA.2016.0040, 10.1109/ICMLA.2016.167]
[3]   An adjusted Grubbs' and generalized extreme studentized deviation [J].
Alrawashdeh, Mufda Jameel .
DEMONSTRATIO MATHEMATICA, 2021, 54 (01) :548-557
[4]  
Alvarez M, 2022, Arxiv, DOI arXiv:2204.09825
[5]   Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach [J].
Amagata, Daichi ;
Onizuka, Makoto ;
Hara, Takahiro .
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, :36-48
[6]   A TEST OF GOODNESS OF FIT [J].
ANDERSON, TW ;
DARLING, DA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1954, 49 (268) :765-769
[7]  
Anderson TW, 2011, Int. Encyclop. Stat. Sci., P52
[8]  
[Anonymous], 1977, EXPLORATORY DATA ANA
[9]   User authentication through typing biometrics features [J].
Araújo, LCF ;
Sucupira, LHR ;
Lizárraga, MG ;
Ling, LL ;
Yabu-Uti, JBT .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2005, 53 (02) :851-855
[10]  
Axelsson S., 2000, ACM Transactions on Information and Systems Security, V3, P186, DOI 10.1145/357830.357849