Improving Intrusion Detection Model Prediction by Threshold Adaptation

被引：9

作者：

Al Tobi, Amjad M. ^{[1
]}

Duncan, Ishbel ^{[2
]}

机构：

[1] Sultan Qaboos Univ, Ctr Informat Syst, POB 40,PC 123, Al Khoud, Oman

[2] Univ St Andrews, Sch Comp Sci, St Andrews KY16 9AJ, Fife, Scotland

来源：

INFORMATION | 2019年 / 10卷 / 05期

关键词：

Intrusion Detection System; anomaly-based IDS; Threshold adaptation; Prediction accuracy improvement; Machine Learning; STA2018; dataset; C5.0; Random Forest; Support Vector Machine; FEATURE-SELECTION; EVOLVING DATA; CLASSIFICATION; PERFORMANCE; TRENDS;

D O I：

10.3390/info10050159

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm's ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.

引用

页数：42

共 99 条

[1] DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].

AGRAWAL, R ;

IMIELINSKI, T ;

SWAMI, A .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925

[2] KDD 1999 generation faults: a review and analysis [J].

Al Tobi, Amjad M. ;

Duncan, Ishbel .

Journal of Cyber Security Technology, 2018, 2 (3-4) :164-200

[3] Automated Anomaly Detector Adaptation using Adaptive Threshold Tuning [J].

Ali, Muhammad Qasim ;

Al-Shaer, Ehab ;

Khan, Hassan ;

Khayam, Syed Ali .

ACM TRANSACTIONS ON INFORMATION AND SYSTEM SECURITY, 2013, 15 (04)

[4] A Survey of Random Forest Based Methods for Intrusion Detection Systems [J].

Alves Resende, Paulo Angelo ;

Drummond, Andre Costa .

ACM COMPUTING SURVEYS, 2018, 51 (03)

[5] Selection bias in gene extraction on the basis of microarray gene-expression data [J].

Ambroise, C ;

McLachlan, GJ .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566

[6]

[Anonymous], P 13 ANN CAN INF TEC

[7]

[Anonymous], GENERATION DATABASE

[8]

[Anonymous], THESIS

[9]

[Anonymous], SEAGENERATOR JAVA

[10]

[Anonymous], 2007, Technical report

← 1 2 3 4 5 6 7 8 9 10 →