UN-AVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring

被引:10
作者
Yousef, Waleed A. [1 ,2 ]
Traore, Issa [1 ]
Briguglio, William [1 ]
机构
[1] Univ Victoria, Dept Elect & Comp Engn, Victoria, BC V8P 5C2, Canada
[2] Helwan Univ, Dept Comp Sci, Human Comp Interact Lab HCILab, Cairo 11795, Egypt
关键词
Visualization; Data visualization; Anomaly detection; Feature extraction; Detection algorithms; Computer security; Complexity theory; Unsupervised; nonparametric; visualization; outliers; anomaly; intrusion detection; intrusion analysis;
D O I
10.1109/TIFS.2021.3125608
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The visualization and detection of anomalies (outliers) are of crucial importance to many fields, particularly cybersecurity. Several approaches have been proposed in these fields, yet to the best of our knowledge, none of them has fulfilled both objectives, simultaneously or cooperatively, in one coherent framework. Moreover, the visualization methods of these approaches were introduced for explaining the output of a detection algorithm, not for data exploration that facilitates a standalone visual detection. This is our point of departure in introducing UN-AVOIDS, an unsupervised and nonparametric approach for both visualization (a human process) and detection (an algorithmic process) of outliers, that assigns invariant anomalous scores (normalized to [0, 1]), rather than hard binary-decision. The main aspect of novelty of UN-AVOIDS is that it transforms data into a new space, which is introduced in this paper as neighborhood cumulative density function (NCDF), in which both visualization and detection are carried out. In this space, outliers are remarkably visually distinguishable, and therefore the anomaly scores assigned by the detection algorithm achieved a high area under the ROC curve (AUC). We assessed UN-AVOIDS on both simulated and two recently published cybersecurity datasets, and compared it to three of the most successful anomaly detection methods: LOF, IF, and FABOD. In terms of AUC, UN-AVOIDS was almost an overall winner with a margin that varied between -0.028 and 0.125, depending on the data. The article concludes by providing a preview of new theoretical and practical avenues for UN-AVOIDS. Among them is designing a visualization aided anomaly detection (VAAD), a type of software that aids analysts by providing UN-AVOIDS' detection algorithm (running in a back engine), NCDF visualization space (rendered to plots), along with other conventional methods of visualization in the original feature space, all of which are linked in one interactive environment.
引用
收藏
页码:5195 / 5210
页数:16
相关论文
共 59 条
[1]  
Aggarwal C, 2018, OUTLIER ANAL
[2]   Graph based anomaly detection and description: a survey [J].
Akoglu, Leman ;
Tong, Hanghang ;
Koutra, Danai .
DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (03) :626-688
[3]  
[Anonymous], 2012, P 21 ACM INT C INFOR, DOI DOI 10.1145/2396761.2396816
[4]  
[Anonymous], 2013, PROBABILISTIC THEORY
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]  
Bridges R. A., 2018, ARXIV180509676V2
[7]  
Bridges RA, 2017, IEEE INT CONF BIG DA, P1071, DOI 10.1109/BigData.2017.8258031
[8]   A multi-level anomaly detection algorithm for time-varying graph data with interactive visualization [J].
Bridges, Robert A. ;
Collins, John ;
Ferragut, Erik M. ;
Laska, Jason ;
Sullivan, Blair D. .
SOCIAL NETWORK ANALYSIS AND MINING, 2016, 6 (01)
[9]   A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection [J].
Buczak, Anna L. ;
Guven, Erhan .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1153-1176
[10]   Z-Glyph: Visualizing outliers in multivariate data [J].
Cao, Nan ;
Lin, Yu-Ru ;
Gotz, David ;
Du, Fan .
INFORMATION VISUALIZATION, 2018, 17 (01) :22-40