Online Anomaly Detection over Big Data Streams

被引:0
|
作者
Rettig, Laura [1 ,2 ]
Khayati, Mourad [2 ]
Cudre-Mauroux, Philippe [2 ]
Piorkowski, Michal [1 ]
机构
[1] Big Data & Business Intelligence Competence Ctr S, Bern, Switzerland
[2] Univ Fribourg, eXascale Infolab, CH-1700 Fribourg, Switzerland
关键词
SKETCH;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-theart streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies-like infrastructure failures, hardware misconfiguration or userdriven anomalies-in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
引用
收藏
页码:1113 / 1122
页数:10
相关论文
共 50 条
  • [31] An Adaptive Anomaly Detection Algorithm for Periodic Data Streams
    Hasani, Zirije
    Jakimovski, Boro
    Velinov, Goran
    Kon-Popovska, Margita
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 385 - 397
  • [32] Anomaly Detection in Data Streams using Fuzzy Logic
    Khan, Muhammad Umair
    2009 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2009, : 126 - 133
  • [33] Anomaly Detection on Data Streams - A LSTM's Diary
    Augenstein, Christoph
    Franczyk, Bogdan
    RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS 2020), 2020, 385 : 369 - 377
  • [34] Self-organizing anomaly detection in data streams
    Forestiero, Agostino
    INFORMATION SCIENCES, 2016, 373 : 321 - 336
  • [35] Supervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams
    Ma, Jiangang
    Sun, Le
    Wang, Hua
    Zhang, Yanchun
    Aickelin, Uwe
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2016, 16 (01)
  • [36] Effective Anomaly Detection in Sensor Networks Data Streams
    Budhaditya, Saha
    Pham, Duc-Son
    Lazarescu, Mihai
    Venkatesh, Svetha
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 722 - 727
  • [37] INFORMATION-VALUE-BASED FEATURE SELECTION ALGORITHM FOR ANOMALY DETECTION OVER DATA STREAMS
    Zhou, Xiaozhen
    Li, Shanping
    Chang, Cheng
    Wu, Jianfeng
    Liu, Kai
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2014, 21 (02): : 223 - 232
  • [38] A Fast kNN-Based Approach for Time Sensitive Anomaly Detection over Data Streams
    Wu, Guangjun
    Zhao, Zhihui
    Fu, Ge
    Wang, Haiping
    Wang, Yong
    Wang, Zhenyu
    Hou, Junteng
    Huang, Liang
    COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 59 - 74
  • [39] A framework for scalable real-time anomaly detection over voluminous, geospatial data streams
    Budgaga, Walid
    Malensek, Matthew
    Pallickara, Sangmi Lee
    Pallickara, Shrideep
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (12):
  • [40] Online updating method with new variables for big data streams
    Wang, Chun
    Chen, Ming-Hui
    Wu, Jing
    Yan, Jun
    Zhang, Yuping
    Schifano, Elizabeth
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2018, 46 (01): : 123 - 146