Real-time Outlier Detection over Streaming Data

被引:7
作者
Yu, Kangqing [1 ]
Shi, Wei [2 ]
Santoro, Nicola [1 ]
Ma, Xiangyu [2 ]
机构
[1] Carleton Univ, Sch Comp Sci, Ottawa, ON, Canada
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
来源
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019) | 2019年
关键词
outlier detections; streaming data; parallel processing; sliding-window; CUDA;
D O I
10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Designing outlier detection algorithms over streaming data involves several issues such as concept drift, temporal context, transience, uncertainty, etc. Moreover, to produce results in real-time with limited memory resources, the processing of such data must occur in an online fashion. Therefore, real time detection of outliers on streaming data faces more challenges than performing the same task on batches of data. Several methods have been proposed to detect outliers over streaming data, among which a sliding window technique is frequently used. In this technique, only a chunk of data is kept in memory at each point in time and used to build predictive models. The size of the data in memory simultaneously is referred to as the size of a sliding window. The correctness of the outlier detection results depends largely on the choice of window size. Other similar techniques exist but most of them fail to address the properties of streaming data, and thus produce results exhibiting poor accuracy. In this paper, we present an online outlier detection algorithm, that addresses the aforementioned challenges. The proposed algorithm adopts the sliding window technique, however efficiently mines in memory a statistical summary of previous observed data, which contributes to the prediction of incoming data. It further addresses the concept drift problem that exists in streaming data. We evaluated the accuracy of our algorithm on both synthetic and real-world datasets. Results show that the proposed method detects outliers over streaming data with higher accuracy than SOD GPU algorithm proposed in [9], even when concept drifts occur. The algorithm does not require a secondary memory for processing and is further accelerated using CUDA GPU.
引用
收藏
页码:125 / 132
页数:8
相关论文
共 19 条
  • [1] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
  • [2] [Anonymous], 1994, Journal of Computational and Graphical Statistics
  • [3] Selecting the forgetting factor in subset autoregressive modelling
    Brailsford, TJ
    Penm, JHW
    Terrell, RD
    [J]. JOURNAL OF TIME SERIES ANALYSIS, 2002, 23 (06) : 629 - 649
  • [4] Cao L, 2014, PROC INT CONF DATA, P76, DOI 10.1109/ICDE.2014.6816641
  • [5] Curiac Daniel-Ioan., 2007, Proceedings of the Integrated Communications, Navigation, and Surveillance Conference, V7, P83
  • [6] Dhaliwal P, 2010, Arxiv, DOI arXiv:1002.4003
  • [7] Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream
    Elahi, Manzoor
    Li, Kun
    Nisar, Wasif
    Lv, Xinjie
    Wang, Hongan
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 298 - 304
  • [8] Georgiadis Dimitrios., 2013, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, P1061
  • [9] HewaNadungodage C, 2016, INT PARALL DISTRIB P, P1133, DOI 10.1109/IPDPS.2016.101
  • [10] Research issues in data stream association rule mining
    Jiang, N
    Gruenwald, L
    [J]. SIGMOD RECORD, 2006, 35 (01) : 14 - 19