Online Anomaly Detection over Big Data Streams

被引:0
|
作者
Rettig, Laura [1 ,2 ]
Khayati, Mourad [2 ]
Cudre-Mauroux, Philippe [2 ]
Piorkowski, Michal [1 ]
机构
[1] Big Data & Business Intelligence Competence Ctr S, Bern, Switzerland
[2] Univ Fribourg, eXascale Infolab, CH-1700 Fribourg, Switzerland
关键词
SKETCH;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-theart streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies-like infrastructure failures, hardware misconfiguration or userdriven anomalies-in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
引用
收藏
页码:1113 / 1122
页数:10
相关论文
共 50 条
  • [1] Anomaly Detection Guidelines for Data Streams in Big Data
    Rana, Annie Ibrahim
    Estrada, Giovani
    Sole, Marc
    Muntes, Victor
    2016 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2016), 2016, : 94 - 98
  • [2] Online Anomaly Detection in Big Data
    Balasingam, B.
    Sankavaram, M. S.
    Choi, K.
    Ayala, D. F. M.
    Sidoti, D.
    Pattipati, K.
    Willett, P.
    Lintz, C.
    Commeau, G.
    Dorigo, F.
    Fahrny, J.
    2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [3] OHODIN - Online Anomaly Detection for Data Streams
    Gruhl, Christian
    Tomforde, Sven
    2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2021), 2021, : 193 - 197
  • [4] A Statistical Technique for Online Anomaly Detection for Big Data Streams in Cloud Collaborative Environment
    Smrithy, G. S.
    Balakrishnan, Ramadoss
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 108 - 111
  • [5] Online Clustering for Evolving Data Streams with Online Anomaly Detection
    Chenaghlou, Milad
    Moshtaghi, Masud
    Leckie, Christopher
    Salehi, Mahsa
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 506 - 519
  • [6] Memory-efficient anomaly detection for online data streams
    He, Shiming
    Guo, Chenxi
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1201 - 1206
  • [7] Sequential Model-Free Anomaly Detection for Big Data Streams
    Kurt, Mehmet Necip
    Yilmaz, Yasin
    Wang, Xiaodong
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 421 - 425
  • [8] Online Anomaly Detection using Non-Parametric Technique for Big Data Streams in Cloud Collaborative Environment
    Smrithy, G. S.
    Munirathinam, Sathyan
    Balakrishnan, Ramadoss
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1950 - 1955
  • [9] Anomaly Detection Aided Budget Online Classification for Imbalanced Data Streams
    Liang, Xijun
    Song, Xiaoxin
    Qi, Kai
    Liu, Jinyu
    Jian, Ling
    Li, Jundong
    IEEE INTELLIGENT SYSTEMS, 2021, 36 (03) : 14 - 22
  • [10] ONLINE REACTIVE ANOMALY DETECTION OVER STREAM DATA
    Fu, Yan
    Zhou, Jun-Lin
    Wu, Yue
    2008 INTERNATIONAL CONFERENCE ON APPERCEIVING COMPUTING AND INTELLIGENCE ANALYSIS (ICACIA 2008), 2008, : 291 - 294