DILAF: A framework for distributed analysis of large-scale system logs for anomaly detection

被引:13
作者
Astekin, Merve [1 ]
Zengin, Harun [2 ]
Sozer, Hasan [3 ]
机构
[1] TUBITAK Informat & Informat Secur Res Ctr TUBITAK, TR-41470 Gebze, Turkey
[2] Bogazici Univ, Dept Comp Engn, Istanbul, Turkey
[3] Ozyegin Univ, Dept Comp Sci, Istanbul, Turkey
关键词
anomaly detection; big data; distributed systems; log analysis; machine learning; parallel processing; software architecture;
D O I
10.1002/spe.2653
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
System logs constitute a rich source of information for detection and prediction of anomalies. However, they can include a huge volume of data, which is usually unstructured or semistructured. We introduce DILAF, a framework for distributed analysis of large-scale system logs for anomaly detection. DILAF is comprised of several processes to facilitate log parsing, feature extraction, and machine learning activities. It has two distinguishing features with respect to the existing tools. First, it does not require the availability of source code of the analyzed system. Second, it is designed to perform all the processes in a distributed manner to support scalable analysis in the context of large-scale distributed systems. We discuss the software architecture of DILAF and we introduce an implementation of it. We conducted controlled experiments based on two datasets to evaluate the effectiveness of the framework. In particular, we evaluated the performance and scalability attributes under various degrees of parallelism. Results showed that DILAF can maintain the same accuracy levels while achieving more than 30% performance improvement on average as the system scales, compared to baseline approaches that do not employ fully distributed processing.
引用
收藏
页码:153 / 170
页数:18
相关论文
共 34 条
[1]  
[Anonymous], 1983, MODERN INFORM RETRIE
[2]  
Banerjee S, 2010, P 2010 IEEE 21 INT S
[3]  
Beschastnikh I, 2014, P 36 INT C SOFTW ENG
[4]  
Beschastnikh Ivan, 2011, P 19 ACM SIGSOFT S 1, P267, DOI DOI 10.1145/2025113.2025151
[5]  
Chen X, 2013, P 2013 43 ANN IEEE I
[6]  
Chuah E, 2013, P 2013 IEEE 32 INT S
[7]  
Du M., 2017, P 2017 ACM SIGSAC C
[8]  
Du M, 2017, P 2016 IEEE 16 INT C
[9]  
Evans D., 2011, The Internet of Things - How the Next Evolution of the Internet is Changing Everything, V1, P1
[10]  
Feldman R., 2007, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data