Random resampling algorithms for addressing the imbalanced dataset classes in insider threat detection

被引：10

作者：

Al-Shehari, Taher ^{[1
]}

Alsowail, Rakan A. ^{[1
]}

机构：

[1] King Saud Univ, Self Dev Skills Dept, Comp Skills, Deanship Common Year 1, Riyadh 11362, Saudi Arabia

来源：

INTERNATIONAL JOURNAL OF INFORMATION SECURITY | 2023年 / 22卷 / 03期

关键词：

Insider data leakage detection; Imbalanced data classification; Resampling techniques; Machine learning model;

D O I：

10.1007/s10207-022-00651-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cybersecurity threats can be perpetrated by insiders or outsiders. The threats that could be carried out by insiders are far more serious due to their privileged access, which they may use to cause financial loss and reputation harm for an organization. Thus, insider threats represent a major cybersecurity challenge for private and government organizations. Researchers and cybersecurity practitioners have proposed different approaches for detecting and mitigating insider threats, but they face many challenges (e.g., dataset availability and the highly imbalanced classes of the available dataset). Because the shortcoming of an insider threat dataset, the benchmarking dataset given by The Computer Emergency Response Team (CERT) was used to validate the majority of the insider threat detection approaches. The CERT dataset of insider threat is extremely imbalanced, and hence, once utilized to validate an insider threat detection model, the detection results may be biased and inaccurate. Such imbalance issue of the CERT dataset is ignored by most existing approaches of insider threat detection. As result, effective model is required to detect insider data leakage incidents from an imbalanced dataset more precisely. In this paper an insider data leakage detection model is proposed to leverage various random sampling techniques and well-known machine learning algorithms to deal with the dataset's extremely imbalanced classes. We evaluate the model on CERT r4.2 insider threat dataset utilizing different sampling techniques, and then compare its performance with the baseline and existing work. The empirical results show that by resolving the imbalanced dataset issue, our model enhances the detection performance of insider data leakage events by surpassing existing approaches.

引用

页码：611 / 629

页数：19

共 58 条

[1] Insider-threat detection using Gaussian Mixture Models and Sensitivity Profiles [J].