Outlier Detection over Sliding Windows for Probabilistic Data Streams

被引:13
作者
Wang, Bin [1 ]
Yang, Xiao-Chun
Wang, Guo-Ren
Yu, Ge
机构
[1] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Peoples R China
基金
中国国家自然科学基金;
关键词
outlier detection; uncertain data; probabilistic data stream; sliding window;
D O I
10.1007/s11390-010-9332-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 vertical bar R(e; d)vertical bar) to O(vertical bar k center dot R(e; d)vertical bar), where R(e; d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.
引用
收藏
页码:389 / 400
页数:12
相关论文
共 16 条
[1]  
Aggarwal C. C., 2008, SDM, P483, DOI 10.1137/1.9781611972788.44
[2]  
Aggarwal CC, 2008, PROC INT CONF DATA, P150, DOI 10.1109/ICDE.2008.4497423
[3]  
Arning A., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P164
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]  
Hinterberger H., 2009, ENCY DATABASE SYSTEM, P1080, DOI [10.1007/978-0-387-39940-9_1384, DOI 10.1007/978-0-387-39940-9_1384]
[6]  
HUA M, 2008, P ACM SIGMOD INT C M, P673
[7]   Sliding-Window Top-k Queries on Uncertain Streams [J].
Jin, Cheqing ;
Yi, Ke ;
Chen, Lei ;
Yu, Jeffrey Xu ;
Lin, Xuemin .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :301-312
[8]  
Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
[9]  
Knorr EM, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P211
[10]  
Kriegel Hans-Peter., 2005, P 11 ACM SIGKDD INT, P672, DOI DOI 10.1145/1081870.1081955