OutRank: A graph-based outlier detection framework using random walk

被引:47
作者
Moonesinghe, H. D. K. [1 ]
Tan, Pang-Ning [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
关键词
outlier detection; random walk; Markov chain;
D O I
10.1142/S0218213008003753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a stochastic graph-based algorithm, called OutRank, for detecting outliers in data. We consider two approaches for constructing a graph representation of the data, based on the object similarity and number of shared neighbors between objects. The heart of this approach is the Markov chain model that is built upon this graph, which assigns an outlier score to each object. Using this framework, we show that our algorithm is more robust than the existing outlier detection schemes and can effectively address the inherent problems of such schemes. Empirical studies conducted on both real and synthetic data sets show that significant improvements in detection rate and false alarm rate are achieved using the proposed framework.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 18 条
[1]  
[Anonymous], P 9 ACM SIGKDD INT C
[2]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[3]  
Erkan G., 2004, LEXPAGERANK PRESTIGE
[4]  
Eskin E., 2000, P 17 INT C MACH LEAR, P255, DOI DOI 10.1109/ICCSA.2008.70
[5]  
GAO J, 2006, P SIAM INT C DAT MIN
[6]  
Hawkins D.M, 1980, IDENTIFICATION OUTLI, V11, DOI [10.1007/978-94-015-3994-4, DOI 10.1007/978-94-015-3994-4]
[7]  
Isaacson D. L., 1976, MARKOV CHAINS THEORY
[8]  
Jin W., 2001, P ACM SIGKDD INT C K
[9]  
Johnson T., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P224
[10]   Distance-based outliers: algorithms and applications [J].
Knorr, EM ;
Ng, RT ;
Tucakov, V .
VLDB JOURNAL, 2000, 8 (3-4) :237-253