A comparative evaluation of outlier detection algorithms: Experiments and analyses

被引:312
作者
Domingues, Remi [1 ]
Filippone, Maurizio [1 ]
Michiardi, Pietro [1 ]
Zouaoui, Jihane [2 ]
机构
[1] EURECOM, Dept Data Sci, Sophia Antipolis, France
[2] Amadeus, Sophia Antipolis, France
关键词
Outlier detection; Fraud detection; Novelty detection; Variational inference; isolation forest;
D O I
10.1016/j.patcog.2017.09.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We survey unsupervised machine learning algorithms in the context of outlier detection. This task challenges state-of-the-art methods from a variety of research fields to applications including fraud detection, intrusion detection, medical diagnoses and data cleaning. The selected methods are benchmarked on publicly available datasets and novel industrial datasets. Each method is then submitted to extensive scalability, memory consumption and robustness tests in order to build a full overview of the algorithms' characteristics. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:406 / 421
页数:16
相关论文
共 33 条
[1]  
[Anonymous], 2008, P 14 ACM SIGKDD INT
[2]  
[Anonymous], 1992, Multivariate Density Estimation
[3]  
[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
[4]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[5]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[6]  
[Anonymous], 2014, ACM SIGKDD Explor. Newsl., DOI [DOI 10.1145/2641190.2641198, 10.1145/2641190.2641198]
[7]  
Ben-Gal I, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P131, DOI 10.1007/0-387-25465-X_7
[8]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38