Intelligent Big Data Summarization for Rare Anomaly Detection

被引:13
作者
Ahmed, Mohiuddin [1 ]
机构
[1] Edith Cowan Univ, Sch Sci, Acad Ctr Cyber Secur Excellence, Joondalup, WA 6027, Australia
关键词
Anomaly detection; data summarization; sampling; clustering; SCADA; network traffic;
D O I
10.1109/ACCESS.2019.2918364
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identifying interesting patterns from a huge amount of data is a challenging task across a wide range of application domain Especially, for cyber security being able to identify rare types of network activities or anomalies from network traffic data (a.k.a. Big Data!) is an important but time-consuming data analysis task having moderate computing resources. Existing research has shown that it is possible to detect rare anomalies from the summarized version of big data. Therefore, summarization is an effective preprocessing function before applying anomaly detection techniques. This aim of this paper is to improve and quantify the scalability and accuracy of the anomaly detection techniques by using summarization. Hence, we propose a sampling-based summarization technique (SUCh: Summarization Using Chernoff-Bound) which is computationally effective than the existing techniques and also performs better in identifying rare anomalies from twelve benchmark network traffic datasets. The experimental results show that, instead of using original dataset, a summary of the data yields better performance in terms of true positive and false positive rates, when used for anomaly detection with less time required.
引用
收藏
页码:68669 / 68677
页数:9
相关论文
共 13 条
[1]  
Ahmed M., 2014, SCALABLE INFORM SYST, P51
[2]   Data summarization: a survey [J].
Ahmed, Mohiuddin .
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 58 (02) :249-273
[3]   Infrequent pattern mining in smart healthcare environment using data summarization [J].
Ahmed, Mohiuddin ;
Ullah, Abu S. S. M. Barkat .
JOURNAL OF SUPERCOMPUTING, 2018, 74 (10) :5041-5059
[4]   Reservoir-based network traffic stream summarization for anomaly detection [J].
Ahmed, Mohiuddin .
PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (02) :579-599
[5]   A survey of network anomaly detection techniques [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Hu, Jiankun .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 60 :19-31
[6]   A survey of anomaly detection techniques in financial domain [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Islam, Md. Rafiqul .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :278-288
[7]  
[Anonymous], EAI ENDORSED T SCALA
[8]  
Cochran WG, 1963, SAMPLING TECHNIQUES
[9]   A Chernoff bound for random walks on expander graphs [J].
Gillman, D .
SIAM JOURNAL ON COMPUTING, 1998, 27 (04) :1203-1220
[10]   Cure: An efficient clustering algorithm for large databases [J].
Guha, S ;
Rastogi, R ;
Shim, K .
INFORMATION SYSTEMS, 2001, 26 (01) :35-58