Online Anomaly Detection over Big Data Streams

被引:0
|
作者
Rettig, Laura [1 ,2 ]
Khayati, Mourad [2 ]
Cudre-Mauroux, Philippe [2 ]
Piorkowski, Michal [1 ]
机构
[1] Big Data & Business Intelligence Competence Ctr S, Bern, Switzerland
[2] Univ Fribourg, eXascale Infolab, CH-1700 Fribourg, Switzerland
关键词
SKETCH;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-theart streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies-like infrastructure failures, hardware misconfiguration or userdriven anomalies-in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
引用
收藏
页码:1113 / 1122
页数:10
相关论文
共 50 条
  • [41] Online updating Huber robust regression for big data streams
    Tao, Chunbai
    Wang, Shanshan
    STATISTICS, 2024, 58 (05) : 1197 - 1223
  • [42] Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams
    Sun, Dawei
    Yan, Hongbin
    Gao, Shang
    Liu, Xunyun
    Buyya, Rajkumar
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (02): : 615 - 636
  • [43] Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams
    Dawei Sun
    Hongbin Yan
    Shang Gao
    Xunyun Liu
    Rajkumar Buyya
    The Journal of Supercomputing, 2018, 74 : 615 - 636
  • [44] The Analysis of Online Event Streams: Predicting the Next Activity for Anomaly Detection
    Lee, Suhwan
    Lu, Xixi
    Reijer, Hajo A.
    RESEARCH CHALLENGES IN INFORMATION SCIENCE, 2022, 446 : 248 - 264
  • [45] Perspective of anomaly detection in big data for data quality improvement
    Keskar, Vinaya
    Yadav, Jyoti
    Kumar, Ajay
    MATERIALS TODAY-PROCEEDINGS, 2022, 51 : 532 - 537
  • [46] Visualization of Data Cubes for Anomaly Detection in Network Traffic Data Streams
    Ahlers, Volker
    Laue, Tim
    Wellermann, Nils
    Heine, Felix
    PROCEEDINGS OF THE THE 11TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS'2021), VOL 1, 2021, : 272 - 277
  • [47] Online Anomaly Detection in Microbiological Data Sets
    Hannig, Leonie
    Weise, Lukas
    Wittmann, Jochen
    ADVANCES AND NEW TRENDS IN ENVIRONMENTAL INFORMATICS: ICT FOR SUSTAINABLE SOLUTIONS, 2020, : 149 - 163
  • [48] Semantic anomaly detection in online data sources
    Raz, O
    Koopman, P
    Shaw, M
    ICSE 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 2002, : 302 - 312
  • [49] Emergent pattern detection algorithm for big data streams
    Fahed, Lina
    Alfalou, Ayman
    PATTERN RECOGNITION AND TRACKING XXXI, 2020, 11400
  • [50] Online event recognition over noisy data streams
    Mantenoglou, Periklis
    Artikis, Alexander
    Paliouras, Georgios
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2023, 161