Real-time Outlier Detection over Streaming Data

被引：7

作者：

Yu, Kangqing ^{[1
]}

Shi, Wei ^{[2
]}

Santoro, Nicola ^{[1
]}

Ma, Xiangyu ^{[2
]}

机构：

[1] Carleton Univ, Sch Comp Sci, Ottawa, ON, Canada

[2] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada

来源：

2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019) | 2019年

关键词：

outlier detections; streaming data; parallel processing; sliding-window; CUDA;

D O I：

10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00063

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Designing outlier detection algorithms over streaming data involves several issues such as concept drift, temporal context, transience, uncertainty, etc. Moreover, to produce results in real-time with limited memory resources, the processing of such data must occur in an online fashion. Therefore, real time detection of outliers on streaming data faces more challenges than performing the same task on batches of data. Several methods have been proposed to detect outliers over streaming data, among which a sliding window technique is frequently used. In this technique, only a chunk of data is kept in memory at each point in time and used to build predictive models. The size of the data in memory simultaneously is referred to as the size of a sliding window. The correctness of the outlier detection results depends largely on the choice of window size. Other similar techniques exist but most of them fail to address the properties of streaming data, and thus produce results exhibiting poor accuracy. In this paper, we present an online outlier detection algorithm, that addresses the aforementioned challenges. The proposed algorithm adopts the sliding window technique, however efficiently mines in memory a statistical summary of previous observed data, which contributes to the prediction of incoming data. It further addresses the concept drift problem that exists in streaming data. We evaluated the accuracy of our algorithm on both synthetic and real-world datasets. Results show that the proposed method detects outliers over streaming data with higher accuracy than SOD GPU algorithm proposed in [9], even when concept drifts occur. The algorithm does not require a secondary memory for processing and is further accelerated using CUDA GPU.

引用

页码：125 / 132

页数：8

共 19 条

[1] Outlier mining in large high-dimensional data sets [J].