Discovering cluster-based local outliers

被引:638
作者
He, ZY [1 ]
Xu, XF [1 ]
Deng, SC [1 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China
关键词
outlier detection; clustering; data mining;
D O I
10.1016/S0167-8655(03)00003-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a new definition for outlier: cluster-based local outlier, which is meaningful and provides importance to the local data behavior. A measure for identifying the physical significance of an outlier is designed, which is called cluster-based local outlier factor (CBLOF). We also propose the FindCBLOF algorithm for discovering outliers. The experimental results show that our approach outperformed the existing methods on identifying meaningful and interesting outliers. (C) 2003 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1641 / 1650
页数:10
相关论文
共 20 条
[1]  
Aggarwal C. C., 2001, SIGMOD Record, V30, P37, DOI 10.1145/376284.375668
[2]  
ANGIULLI F, 2002, P PKDD 02
[3]  
Arning A., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P164
[4]  
Barnett V., 1994, Outliers in Statistical Data, V3rd
[5]  
Beyer Kevin., 1999, INT C DATABASE THEOR, P217, DOI DOI 10.1007/3-540-49257-7_15
[6]  
Breunig M., 2000, P SIGMOD 00 DALL TEX, P427
[7]  
Ester M, 1996, 2 INT C KNOWL DISCOV, P226, DOI DOI 10.5555/3001460.3001507
[8]   ROCK: A robust clustering algorithm for categorical attributes [J].
Guha, S ;
Rastogi, R ;
Shim, K .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :512-521
[9]  
HARKINS S, 2002, P 4 INT C DAT WAR KN, P170
[10]   Squeezer: An efficient algorithm for clustering categorical data [J].
He, ZY ;
Xu, XF ;
Deng, SC .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) :611-624