Online Anomaly Detection over Big Data Streams

被引:0
|
作者
Rettig, Laura [1 ,2 ]
Khayati, Mourad [2 ]
Cudre-Mauroux, Philippe [2 ]
Piorkowski, Michal [1 ]
机构
[1] Big Data & Business Intelligence Competence Ctr S, Bern, Switzerland
[2] Univ Fribourg, eXascale Infolab, CH-1700 Fribourg, Switzerland
关键词
SKETCH;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-theart streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies-like infrastructure failures, hardware misconfiguration or userdriven anomalies-in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
引用
收藏
页码:1113 / 1122
页数:10
相关论文
共 50 条
  • [21] A User Behavior Anomaly Detection Approach based on Sequence Mining over Data Streams
    Zhou, Yong
    Wang, Yijie
    Ma, Xingkong
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 376 - 381
  • [22] EADetection: An efficient and accurate sequential behavior anomaly detection approach over data streams
    Cheng, Li
    Wang, Yijie
    Zhou, Yong
    Ma, Xingkong
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (10)
  • [23] Self-Supervised Learning for Online Anomaly Detection in High-Dimensional Data Streams
    Mozaffari, Mahsa
    Doshi, Keval
    Yilmaz, Yasin
    ELECTRONICS, 2023, 12 (09)
  • [24] Contextual Anomaly Detection in Big Sensor Data
    Hayes, Michael A.
    Capretz, Miriam A. M.
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 64 - 71
  • [25] Anomaly Detection for Big Data Security: A Benchmark
    Es-Samaali, Hamza H.
    Outchakoucht, Aissam A.
    Benhadou, Siham S.
    Mounnan, Oussama O.
    Abou El Kalam, Anas A.
    2021 THE 3RD INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND TECHNOLOGY, BDET 2021, 2021, : 35 - 39
  • [26] Big Data Analytics for Anomaly Detection in Blockchain
    Ozbilen, Mahmut Lutfullah
    Ozcan, Elif
    Keles, Mustafa Berk
    Zeybel, Merve
    Dervisoglu, Havanur
    Dogan, Aslinur
    Haklidir, Mehmet
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [27] Anomaly Detection on Data Streams for Machine Condition Monitoring
    Brandt, Tobias
    Grawunder, Marco
    Appelrath, Hans-Juergen
    2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2016, : 1282 - 1287
  • [28] Solution Pattern for Anomaly Detection in Financial Data Streams
    Zakrzewicz, Maciej
    Wojciechowski, Marek
    Glawinski, Pawel
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 77 - 84
  • [29] Evolving Fuzzy Rules for Anomaly Detection in Data Streams
    Moshtaghi, Masud
    Bezdek, James C.
    Leckie, Christopher
    Karunasekera, Shanika
    Palaniswami, Marimuthu
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (03) : 688 - 700
  • [30] Anomaly Detection in Data Streams: The Petrol Station Simulator
    Gorawska, Anna
    Pasterak, Krzysztof
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 727 - 736