There and back again: Outlier detection between statistical reasoning and data mining algorithms

被引:98
作者
Zimek, Arthur [1 ]
Filzmoser, Peter [2 ]
机构
[1] Univ Southern Denmark, Dept Math & Comp Sci, Campusvej 55, DK-5230 Odense M, Denmark
[2] Vienna Univ Technol, Inst Stat & Math Methods Econ, Vienna, Austria
关键词
anomaly detection; outlier detection; outlier model; statistics and data mining; DISTANCE-BASED OUTLIERS; ANOMALY DETECTION; NOVELTY DETECTION; IDENTIFICATION; FRAMEWORK; EFFICIENT; LOCATION; REJECTION; SELECTION; EXPLORATION;
D O I
10.1002/widm.1280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection has been a topic in statistics for centuries. Over mainly the last two decades, there has been also an increasing interest in the database and data mining community to develop scalable methods for outlier detection. Initially based on statistical reasoning, however, these methods soon lost the direct probabilistic interpretability of the derived outlier scores. Here, we detail from a joint point of view of data mining and statistics the roots and the path of development of statistical outlier detection and of database-related data mining methods for outlier detection. We discuss their inherent meaning, review approaches to again find a statistically meaningful interpretation of outlier scores, and sketch related current research topics. This article is categorized under: Algorithmic Development > Statistics Algorithmic Development > Scalable Statistical Methods Technologies > Machine Learning
引用
收藏
页数:26
相关论文
共 229 条
[1]  
ABRAHAM B, 1979, BIOMETRIKA, V66, P229, DOI 10.1093/biomet/66.2.229
[2]  
Agostinelli C., 2007, COMPUTATIONAL STAT D, V51, P5847
[3]   A comprehensive survey of numeric and symbolic outlier mining techniques [J].
Agyemang, Malik ;
Barker, Ken ;
Alhajj, Rada .
INTELLIGENT DATA ANALYSIS, 2006, 10 (06) :521-538
[4]   Graph based anomaly detection and description: a survey [J].
Akoglu, Leman ;
Tong, Hanghang ;
Koutra, Danai .
DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (03) :626-688
[5]   PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA [J].
Alqallaf, Fatemah ;
Van Aelst, Stefan ;
Yohai, Victor J. ;
Zamar, Ruben H. .
ANNALS OF STATISTICS, 2009, 37 (01) :311-331
[6]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[7]  
Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
[8]   Outlying property detection with numerical attributes [J].
Angiulli, Fabrizio ;
Fassetti, Fabio ;
Manco, Giuseppe ;
Palopoli, Luigi .
DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (01) :134-163
[9]   Discovering Characterizations of the Behavior of Anomalous Subpopulations [J].
Angiulli, Fabrizio ;
Fassetti, Fabio ;
Palopoli, Luigi .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) :1280-1292
[10]   DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)