Real-time Outlier Detection over Streaming Data

被引:7
|
作者
Yu, Kangqing [1 ]
Shi, Wei [2 ]
Santoro, Nicola [1 ]
Ma, Xiangyu [2 ]
机构
[1] Carleton Univ, Sch Comp Sci, Ottawa, ON, Canada
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
关键词
outlier detections; streaming data; parallel processing; sliding-window; CUDA;
D O I
10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Designing outlier detection algorithms over streaming data involves several issues such as concept drift, temporal context, transience, uncertainty, etc. Moreover, to produce results in real-time with limited memory resources, the processing of such data must occur in an online fashion. Therefore, real time detection of outliers on streaming data faces more challenges than performing the same task on batches of data. Several methods have been proposed to detect outliers over streaming data, among which a sliding window technique is frequently used. In this technique, only a chunk of data is kept in memory at each point in time and used to build predictive models. The size of the data in memory simultaneously is referred to as the size of a sliding window. The correctness of the outlier detection results depends largely on the choice of window size. Other similar techniques exist but most of them fail to address the properties of streaming data, and thus produce results exhibiting poor accuracy. In this paper, we present an online outlier detection algorithm, that addresses the aforementioned challenges. The proposed algorithm adopts the sliding window technique, however efficiently mines in memory a statistical summary of previous observed data, which contributes to the prediction of incoming data. It further addresses the concept drift problem that exists in streaming data. We evaluated the accuracy of our algorithm on both synthetic and real-world datasets. Results show that the proposed method detects outliers over streaming data with higher accuracy than SOD GPU algorithm proposed in [9], even when concept drifts occur. The algorithm does not require a secondary memory for processing and is further accelerated using CUDA GPU.
引用
收藏
页码:125 / 132
页数:8
相关论文
共 50 条
  • [21] Real-time streaming of environmental field data
    Vivoni, ER
    Camilli, R
    COMPUTERS & GEOSCIENCES, 2003, 29 (04) : 457 - 468
  • [22] A real-time adaptive network intrusion detection for streaming data: a hybrid approach
    Saeed, Mozamel M.
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08): : 6227 - 6240
  • [23] A real-time adaptive network intrusion detection for streaming data: a hybrid approach
    Mozamel M. Saeed
    Neural Computing and Applications, 2022, 34 : 6227 - 6240
  • [24] Real-time caption streaming over WiFi network
    Maniezzo, D
    Cesana, A
    Bergamo, P
    Gerla, A
    Yao, K
    ITRE2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, 2003, : 316 - 320
  • [25] Event detection from real-time twitter streaming data using community detection algorithm
    Jagrati Singh
    Digvijay Pandey
    Anil Kumar Singh
    Multimedia Tools and Applications, 2024, 83 : 23437 - 23464
  • [26] Event detection from real-time twitter streaming data using community detection algorithm
    Singh, Jagrati
    Pandey, Digvijay
    Singh, Anil Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23437 - 23464
  • [27] Shape based kinetic outlier detection in real-time PCR
    Davide Sisti
    Michele Guescini
    Marco BL Rocchi
    Pasquale Tibollo
    Mario D'Atri
    Vilberto Stocchi
    BMC Bioinformatics, 11
  • [28] Outlier Detection in Streaming Data A research Perspective
    Chugh, Neeraj
    Chugh, Mitali
    Agarwal, Alok
    2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 429 - 432
  • [29] Shape based kinetic outlier detection in real-time PCR
    Sisti, Davide
    Guescini, Michele
    Rocchi, Marco B. L.
    Tibollo, Pasquale
    D'Atri, Mario
    Stocchi, Vilberto
    BMC BIOINFORMATICS, 2010, 11
  • [30] A dynamic balanced quadtree for real-time streaming data
    Yang, Guang
    Wu, Xia
    Zhang, Jing
    KNOWLEDGE-BASED SYSTEMS, 2023, 263