Identifying the most accurate machine learning classification technique to detect network threats

被引：0

作者：

Mohamed Farouk

Rasha Hassan Sakr

Noha Hikal

机构：

[1] Mansoura University,Department of Information Security, Faculty of Computers and Information Sciences

[2] Mansoura University,Department of Computer Science, Faculty of Computers and Information Sciences

[3] Mansoura University,Department of Information Technology, Faculty of Computers and Information Sciences

来源：

Neural Computing and Applications | 2024年 / 36卷

关键词：

Machine learning; Insider threats; Insider attacks; NSL-KDD data set; Cybersecurity;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Insider threats have recently become one of the most urgent cybersecurity challenges facing numerous businesses, such as public infrastructure companies, major federal agencies, and state and local governments. Our purpose is to find the most accurate machine learning (ML) model to detect insider attacks. In the realm of machine learning, the most convenient classifier is usually selected after further evaluation trials of candidate models which can cause unseen data (test data set) to leak into models and create bias. Accordingly, overfitting occurs because of frequent training of models and tuning hyperparameters; the models perform well on the training set while failing to generalize effectively to unseen data. The validation data set and hyperparameter tuning are utilized in this study to prevent the issues mentioned above and to choose the best model from our candidate models. Furthermore, our approach guarantees that the selected model does not memorize data of the threats occurring in the local area network (LAN) through the usage of the NSL-KDD data set. The following results are gathered and analyzed: support vector machine (SVM), decision tree (DT), logistic regression (LR), adaptive boost (AdaBoost), gradient boosting (GB), random forests (RFs), and extremely randomized trees (ERTs). After analyzing the findings, we conclude that the AdaBoost model is the most accurate, with a DoS of 99%, a probe of 99%, access of 96%, and privilege of 97%, as well as an AUC of 0.992 for DoS, 0.986 for probe, 0.952 for access, and 0.954 for privilege.

引用

页码：8977 / 8994

页数：17

共 97 条

[1]

Yuan S(2021)Deep learning for insider threat detection: review, challenges, and opportunities Comput Secur 20 1934-1965

[2]

Wu X(2020)Insider threat risk prediction based on bayesian network Comput Secur 2 160-2174

[3]

Nebrase Elmrabit SHY(2014)Information assurance technical framework: an end user perspective J Inf Priv Secur 104 102221-undefined

[4]

Korotka MS(2019)Cross-validation with confidence J Am Stat Assoc 9 1871-undefined

[5]

Yin LR(2019)Tunability: importance of hyperparameters of machine learning algorithms J Mach Learn Res undefined undefined-undefined

[6]

Basu SC(2023)Agent-based collaborative random search for hyperparameter tuning and global function optimization Systems undefined undefined-undefined

[7]

Lei J(2021)Parsimonious model selection using information theory: a modified selection rule Ecol Soc Am undefined undefined-undefined

[8]

Probst P(2022)Cross validation for model selection: a review with examples from ecology Ecol Monogr undefined undefined-undefined

[9]

Boulesteix AL(2020)A review of insider threat detection: classification, machine learning techniques, datasets, open challenges, and recommendations Appl Sci undefined undefined-undefined

[10]

Bischl B(2019)SoK: a systematic review of insider threat detection J Wirel Mob Netw Ubiquitous Comput Dependable Appl undefined undefined-undefined

← 1 2 3 4 5 6 7 8 9 10 →