Statistical Technique for Online Anomaly Detection Using Spark Over Heterogeneous Data from Multi-source VMware Performance Data

被引:0
作者
Solaimani, Mohiuddin [1 ]
Iftekhar, Mohammed [1 ]
Khan, Latifur [1 ]
Thuraisingham, Bhavani [1 ]
机构
[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2014年
关键词
Anomaly detection; Online anomaly detection; Real-time anomaly detection; Chi-square test; Data center; Apache Spark;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Anomaly detection refers to the identification of patterns in a dataset that do not conform to expected patterns. Depending on the domain, the non-conformant patterns are assigned various tags, e.g. anomalies, outliers, exceptions, malwares and so forth. Online anomaly detection aims to detect anomalies in data flowing in a streaming fashion. Such stream data is commonplace in today's cloud data centers that house a large array of virtual machines(VM) producing vast amounts of performance data in real-time. Sophisticated detection mechanism will likely entail collation of data from heterogeneous sources with diversified data format and semantics. Therefore, detection of performance anomaly in this context requires a distributed framework with high throughput and low latency. Apache Spark is one such framework that represents the bleeding-edge amongst its contemporaries. In this paper, we have taken up the challenge of anomaly detection in VMware based cloud data centers. We have employed a Chi-square based statistical anomaly detection technique in Spark. We have demonstrated how to take advantage of the high processing power of Spark to perform anomaly detection on heterogeneous data using statistical techniques. Our approach is optimally designed to cope with the heterogeneity of input data streams and the experiments we conducted testify to its efficacy in online anomaly detection.
引用
收藏
页码:1086 / 1094
页数:9
相关论文
共 14 条
  • [1] [Anonymous], 2005, Wiley series in probability and statistics
  • [2] [Anonymous], P 2013 ACM CLOUD AUT
  • [3] Assent Ira, 2012, Database Systems for Advanced Applications. Proceedings of the 17th International Conference, DASFAA 2012, P228, DOI 10.1007/978-3-642-29038-1_18
  • [4] Bigtable: A distributed storage system for structured data
    Chang, Fay
    Dean, Jeffrey
    Ghemawat, Sanjay
    Hsieh, Wilson C.
    Wallach, Deborah A.
    Burrows, Mike
    Chandra, Tushar
    Fikes, Andrew
    Gruber, Robert E.
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2008, 26 (02):
  • [5] Chengwei Wang, 2011, 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM 2011), P385, DOI 10.1109/INM.2011.5990537
  • [6] CRESSIE N, 1984, J ROY STAT SOC B MET, V46, P440
  • [7] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [8] Frey E., 2009, BashReduce
  • [9] Gupta M., CONTEXT AWARE TIME S
  • [10] Solaimani M., 2014, REAL TIME ANOMALY DE