A study on anomaly detection ensembles

被引:16
作者
Chiang, Alvin [1 ]
David, Esther [2 ]
Lee, Yuh-Jye [3 ]
Leshem, Guy [2 ]
Yeh, Yi-Ren [4 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
[2] Ashkelon Acad Coll, Ashqelon, Israel
[3] Natl Kaohsiung Normal Univ, Kaohsiung, Taiwan
[4] Natl Chiao Tung Univ, Hsinchu, Taiwan
关键词
Ensemble; Machine learning; Outlier algorithm classification;
D O I
10.1016/j.jal.2016.12.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An anomaly, or outlier, is an object exhibiting differences that suggest it belongs to an as-yet undefined class or category. Early detection of anomalies often proves of great importance because they may correspond to events such as fraud, spam, or device malfunctions. By automating the creation of a ranking or list of deviations, we can save time and decrease the cognitive overload of the individuals or groups responsible for responding to such events. Over the years many anomaly and outlier metrics have been developed. In this paper we propose a clustering-based score ensembling method for outlier detection. Using benchmark datasets we evaluate quantitatively the robustness and accuracy of different ensemble strategies. We find that ensembling strategies offer only limited value for increasing overall performance, but provide robustness by negating the influence of severely underperforming models. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 33 条
[1]  
Abdi H., 2007, Encyclopedia of measurement and statistics, P508, DOI DOI 10.4135/9781412952644.N239
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]  
Buitinck L, 2013, ECML PKDD WORKSH LAN, P108, DOI DOI 10.48550/ARXIV.1309.0238
[6]  
Charu C., 2012, ACM SIGKDD EXPLORATI, V14, P49, DOI [10.1145/2481244.2481252, DOI 10.1145/2481244.2481252]
[7]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[8]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[9]  
Gates C., 2006, Proceedings of the 2006 Workshop on New Security Paradigms (NSPW), P21
[10]   The Amsterdam Library of Object Images [J].
Geusebroek, JM ;
Burghouts, GJ ;
Smeulders, AWM .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 61 (01) :103-112