RDF: A density-based Outlier detection method using vertical data representation

被引:24
作者
Ren, DM [1 ]
Wang, BY [1 ]
Perrizo, W [1 ]
机构
[1] N Dakota State Univ, Dept Comp Sci, Fargo, ND 58105 USA
来源
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDM.2004.10010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection can lead to discovering unexpected and interesting knowledge, which is critical important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, etc. In this paper, we developed an efficient density-based outlier detection method for large datasets. Our contributions are: a) We introduce a relative density factor (RDF); b) Based on RDF, we propose an RDF-based outlier detection method which can efficiently prune the data points which are deep in clusters, and detect outliers only within the remaining small subset of the data; c) The performance of our method is further improved by means of a vertical data representation, P-trees. We tested our method with NHL and NBA data. Our method shows an order of magnitude speed improvement compared to the contemporary approaches.
引用
收藏
页码:503 / 506
页数:4
相关论文
共 12 条
  • [1] Arning A., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P164
  • [2] Barnett V., 1984, Outliers in Statistical Data, V2nd
  • [3] DING Q, 2002, P ACM SAC S APPL COM
  • [4] Data clustering: A review
    Jain, AK
    Murty, MN
    Flynn, PJ
    [J]. ACM COMPUTING SURVEYS, 1999, 31 (03) : 264 - 323
  • [5] Knorr E. M., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P219
  • [6] KNORR EM, 1998, VER LARG DAT BAS C P, P24
  • [7] MARKUS M, 2000, P ACM SIGMOD 2000 IN
  • [8] PAN F, 2003, EFFICIENT DENSITY CL
  • [9] PAPADIMITRIOU S, 2003, 19 INT C DAT ENG 200
  • [10] PERRIZO W, 2001, NDSUCSORTR011