Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

被引:203
作者
Schubert, Erich [1 ]
Zimek, Arthur [2 ]
Kriegel, Hans-Peter [1 ]
机构
[1] Univ Munich, D-80538 Munich, Germany
[2] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Local outlier; Spatial outlier; Video outlier; Network outlier; ALGORITHMS; FRAMEWORK;
D O I
10.1007/s10618-012-0300-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection research has been seeing many new algorithms every year that often appear to be only slightly different from existing methods along with some experiments that show them to "clearly outperform" the others. However, few approaches come along with a clear analysis of existing methods and a solid theoretical differentiation. Here, we provide a formalized method of analysis to allow for a theoretical comparison and generalization of many existing methods. Our unified view improves understanding of the shared properties and of the differences of outlier detection models. By abstracting the notion of locality from the classic distance-based notion, our framework facilitates the construction of abstract methods for many special data types that are usually handled with specialized algorithms. In particular, spatial neighborhood can be seen as a special case of locality. Here we therefore compare and generalize approaches to spatial outlier detection in a detailed manner. We also discuss temporal data like video streams, or graph data such as community networks. Since we reproduce results of specialized approaches with our general framework, and even improve upon them, our framework provides reasonable baselines to evaluate the true merits of specialized approaches. At the same time, seeing spatial outlier detection as a special case of local outlier detection, opens up new potentials for analysis and advancement of methods.
引用
收藏
页码:190 / 237
页数:48
相关论文
共 72 条
[1]  
Achtert Elke, 2011, Advances in Spatial and Temporal Databases. Proceedings 12th International Symposium (SSTD 2011), P512, DOI 10.1007/978-3-642-22922-0_41
[2]   Evaluation of Clusterings - Metrics and Visual Support [J].
Achtert, Elke ;
Goldhofer, Sascha ;
Kriegel, Hans-Peter ;
Schubert, Erich ;
Zimek, Arthur .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :1285-1288
[3]  
Aggarwal C. C., 2008, SDM, P483, DOI 10.1137/1.9781611972788.44
[4]  
Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
[5]   A comprehensive survey of numeric and symbolic outlier mining techniques [J].
Agyemang, Malik ;
Barker, Ken ;
Alhajj, Rada .
INTELLIGENT DATA ANALYSIS, 2006, 10 (06) :521-538
[6]  
Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
[7]   DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
[8]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[9]  
[Anonymous], 2004, P VLDB
[10]  
[Anonymous], 2010, P 19 ACM INT C INF K, DOI [10.1145/1871437.1871690, DOI 10.1145/1871437.1871690]