Improving Intrusion Detection Model Prediction by Threshold Adaptation

被引：9

作者：

Al Tobi, Amjad M. ^{[1
]}

Duncan, Ishbel ^{[2
]}

机构：

[1] Sultan Qaboos Univ, Ctr Informat Syst, POB 40,PC 123, Al Khoud, Oman

[2] Univ St Andrews, Sch Comp Sci, St Andrews KY16 9AJ, Fife, Scotland

来源：

INFORMATION | 2019年 / 10卷 / 05期

关键词：

Intrusion Detection System; anomaly-based IDS; Threshold adaptation; Prediction accuracy improvement; Machine Learning; STA2018; dataset; C5.0; Random Forest; Support Vector Machine; FEATURE-SELECTION; EVOLVING DATA; CLASSIFICATION; PERFORMANCE; TRENDS;

D O I：

10.3390/info10050159

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm's ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.

引用

页数：42

共 99 条

[81]

NEMENYI P, 1962, BIOMETRICS, V18, P263

[82]

Onut I.-V., 2007, International Journal of Network Security, V5, P1

[83] Threshold optimisation for multi-label classifiers [J].

Pillai, Ignazio ;

Fumera, Giorgio ;

Roli, Fabio .

PATTERN RECOGNITION, 2013, 46 (07) :2055-2065

[84]

Quinlan J. R., 2014, C4 5 PROGRAMS MACHIN

[85]

Rudnicki WR, 2015, STUD COMPUT INTELL, V584, P11, DOI 10.1007/978-3-662-45620-0_2

[86] The unequal variance t-test is an underused alternative to Student's t-test and the Mann-Whitney U test [J].

Ruxton, GD .

BEHAVIORAL ECOLOGY, 2006, 17 (04) :688-690

[87]

SHAPIRO SS, 1965, BIOMETRIKA, V52, P591, DOI 10.2307/2333709

[88] Toward developing a systematic approach to generate benchmark datasets for intrusion detection [J].

Shiravi, Ali ;

Shiravi, Hadi ;

Tavallaee, Mahbod ;

Ghorbani, Ali A. .

COMPUTERS & SECURITY, 2012, 31 (03) :357-374

[89] S-MAIDS: A Semantic Model for Automated Tuning, Correlation, and Response Selection in Intrusion Detection Systems [J].

Strasburg, Chris ;

Basu, Samik ;

Wong, Johnny S. .

2013 IEEE 37TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2013, :319-328

[90]

Street W. N., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P377, DOI 10.1145/502512.502568

← 1 2 3 4 5 6 7 8 9 10 →