Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

被引:10
作者
Al-Shehari, Taher [1 ]
Alsowail, Rakan A. [1 ]
机构
[1] King Saud Univ, Self Dev Skills Dept, Comp Skills, Deanship Common Year 1, Riyadh 11362, Saudi Arabia
关键词
Insider data leakage detection; Imbalanced data classification; Resampling techniques; Machine learning model;
D O I
10.1007/s10207-022-00651-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybersecurity threats can be perpetrated by insiders or outsiders. The threats that could be carried out by insiders are far more serious due to their privileged access, which they may use to cause financial loss and reputation harm for an organization. Thus, insider threats represent a major cybersecurity challenge for private and government organizations. Researchers and cybersecurity practitioners have proposed different approaches for detecting and mitigating insider threats, but they face many challenges (e.g., dataset availability and the highly imbalanced classes of the available dataset). Because the shortcoming of an insider threat dataset, the benchmarking dataset given by The Computer Emergency Response Team (CERT) was used to validate the majority of the insider threat detection approaches. The CERT dataset of insider threat is extremely imbalanced, and hence, once utilized to validate an insider threat detection model, the detection results may be biased and inaccurate. Such imbalance issue of the CERT dataset is ignored by most existing approaches of insider threat detection. As result, effective model is required to detect insider data leakage incidents from an imbalanced dataset more precisely. In this paper an insider data leakage detection model is proposed to leverage various random sampling techniques and well-known machine learning algorithms to deal with the dataset's extremely imbalanced classes. We evaluate the model on CERT r4.2 insider threat dataset utilizing different sampling techniques, and then compare its performance with the baseline and existing work. The empirical results show that by resolving the imbalanced dataset issue, our model enhances the detection performance of insider data leakage events by surpassing existing approaches.
引用
收藏
页码:611 / 629
页数:19
相关论文
共 58 条
[1]   Insider-threat detection using Gaussian Mixture Models and Sensitivity Profiles [J].
Al Tabash, Kholood ;
Happa, Jassim .
COMPUTERS & SECURITY, 2018, 77 :838-859
[2]  
Al-Mhiqani MN, 2021, INT J ADV COMPUT SC, V12, P573
[3]   An Ensemble Learning Approach for Accurate Energy Prediction in Residential Buildings [J].
Al-Rakhami, Mabrook ;
Gumaei, Abdu ;
Alsanad, Ahmed ;
Alamri, Atif ;
Hassan, Mohammad Mehedi .
IEEE ACCESS, 2019, 7 :48328-48338
[4]   An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques [J].
Al-Shehari, Taher ;
Alsowail, Rakan A. .
ENTROPY, 2021, 23 (10)
[5]   A Trust Aware Unsupervised Learning Approach for Insider Threat Detection [J].
Aldairi, Maryam ;
Karimi, Leila ;
Joshi, James .
2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, :89-98
[6]  
Ali J., 2012, Int J Comput Sci Issues (IJCSI), V9, P272
[7]   A Genetic-Based Extreme Gradient Boosting Model for Detecting Intrusions in Wireless Sensor Networks [J].
Alqahtani, Mnahi ;
Gumaei, Abdu ;
Mathkour, Hassan ;
Ben Ismail, Mohamed Maher .
SENSORS, 2019, 19 (20)
[8]   A Multi-Tiered Framework for Insider Threat Prevention [J].
Alsowail, Rakan A. ;
Al-Shehari, Taher .
ELECTRONICS, 2021, 10 (09)
[9]   Empirical Detection Techniques of Insider Threat Incidents [J].
Alsowail, Rakan A. ;
Al-Shehari, Taher .
IEEE ACCESS, 2020, 8 :78385-78402
[10]   Behavioral Analysis of Insider Threat: A Survey and Bootstrapped Prediction in Imbalanced Data [J].
Azaria, Amos ;
Richardson, Ariella ;
Kraus, Sarit ;
Subrahmanian, V. S. .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2014, 1 (02) :135-155