Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble

被引:56
作者
Bandaragoda, Tharindu R. [1 ]
Ting, Kai Ming [2 ]
Albrecht, David [1 ]
Liu, Fei Tony [1 ]
Wells, Jonathan R. [1 ]
机构
[1] Monash Univ, Sch Informat Technol, Clayton, Vic 3800, Australia
[2] Federat Univ, Sch Engn & Informat Technol, Mt Helen, Vic, Australia
来源
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2014年
关键词
D O I
10.1109/ICDMW.2014.70
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. iNNE runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances.
引用
收藏
页码:698 / 705
页数:8
相关论文
共 25 条
[1]  
Achtert E., 2013, Proceedings of the ACM SIGMOD International Conference on Management of Data, P1009, DOI DOI 10.1145/2463676.2463696
[2]  
Aggarwal C.C, 2013, ACM SIGKDD Explor. Newsl, V14, P49, DOI DOI 10.1145/2481244.2481252
[3]  
Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
[4]   DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
[5]  
Anguita D., 2012, P INT WORKSH AMB ASS, P216
[6]  
[Anonymous], 1990, P 1990 ACM SIGMOD IN, DOI DOI 10.1145/93597.98741
[7]  
Bay S.D., 2003, KDD, P29, DOI DOI 10.1145/956750.956758
[8]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]   Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants [J].
Danziger, Samuel A. ;
Zeng, Jue ;
Wang, Ying ;
Brachmann, Rainer K. ;
Lathrop, Richard H. .
BIOINFORMATICS, 2007, 23 (13) :I104-I114