Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

被引:10
作者
Al-Shehari, Taher [1 ]
Alsowail, Rakan A. [1 ]
机构
[1] King Saud Univ, Self Dev Skills Dept, Comp Skills, Deanship Common Year 1, Riyadh 11362, Saudi Arabia
关键词
Insider data leakage detection; Imbalanced data classification; Resampling techniques; Machine learning model;
D O I
10.1007/s10207-022-00651-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybersecurity threats can be perpetrated by insiders or outsiders. The threats that could be carried out by insiders are far more serious due to their privileged access, which they may use to cause financial loss and reputation harm for an organization. Thus, insider threats represent a major cybersecurity challenge for private and government organizations. Researchers and cybersecurity practitioners have proposed different approaches for detecting and mitigating insider threats, but they face many challenges (e.g., dataset availability and the highly imbalanced classes of the available dataset). Because the shortcoming of an insider threat dataset, the benchmarking dataset given by The Computer Emergency Response Team (CERT) was used to validate the majority of the insider threat detection approaches. The CERT dataset of insider threat is extremely imbalanced, and hence, once utilized to validate an insider threat detection model, the detection results may be biased and inaccurate. Such imbalance issue of the CERT dataset is ignored by most existing approaches of insider threat detection. As result, effective model is required to detect insider data leakage incidents from an imbalanced dataset more precisely. In this paper an insider data leakage detection model is proposed to leverage various random sampling techniques and well-known machine learning algorithms to deal with the dataset's extremely imbalanced classes. We evaluate the model on CERT r4.2 insider threat dataset utilizing different sampling techniques, and then compare its performance with the baseline and existing work. The empirical results show that by resolving the imbalanced dataset issue, our model enhances the detection performance of insider data leakage events by surpassing existing approaches.
引用
收藏
页码:611 / 629
页数:19
相关论文
共 58 条
[11]  
Ben Salem M, 2008, ADV INFORM SECUR, V39, P69
[12]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[13]   A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection [J].
Buczak, Anna L. ;
Guven, Erhan .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1153-1176
[14]  
CERT and ExactData LLC, INS THREAT TEST DAT
[15]   Anomaly Detection: A Survey [J].
Chandola, Varun ;
Banerjee, Arindam ;
Kumar, Vipin .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[16]  
Chen F, 2018, PROCEEDINGS OF THE 2018 IEEE 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS SYSTEMS WITHIN THE INTERNATIONAL CONFERENCES ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS (IDAACS-SWS), P68, DOI 10.1109/IDAACS-SWS.2018.8525522
[17]  
Collins M., 2016, Technical report CMU/SEI-2015-TR-010
[18]  
CSO CERT Division of SEI-CMU U.S. Secret Service and K., 2018 US STAT CYB SUR
[19]  
Diop A., 2020, P 2020 IEEE 27 INT C
[20]   Random Forest Modeling for Network Intrusion Detection System [J].
Farnaaz, Nabila ;
Jabbar, M. A. .
TWELFTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2016 / TWELFTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2016 / TWELFTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2016, 2016, 89 :213-217