Example-based robust DB-Outlier detection for high dimensional data

被引:0
作者
Li, Yuan [1 ]
Kitagawa, Hiroyuki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tennoudai 1-1-1, Tsukuba, Ibaraki 3058573, Japan
[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki 3058573, Japan
来源
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS | 2008年 / 4947卷
关键词
outlier; DB-Outlier; high-dimensional data; example;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a method of outlier detection to identify exceptional objects that match user intentions in high dimensional datasets. Outlier detection is a crucial element of many applications like financial analysis and fraud detection. Scholars have made numerous investigations, but the results show that current methods fail to directly discover outliers from high dimensional datasets due to the curse of dimensionality. Beyond that, many algorithms require several decisive parameters to be predefined. Such vital parameters are considerably difficult to determine without identifying datasets beforehand. To address these problems, we take an Example-Based approach and examine behaviors of projections of the outlier examples in a dataset. An example-based approach is promising, since users are probably able to provide a few outlier examples to suggest what they want to detect. An important point is that the method should be robust, even if user-provided examples include noises or inconsistencies. Our proposed method is based on the notion of DB- (Distance-Based) Outliers. Experiments demonstrate that our proposed method is effective and efficient on both synthetic and real datasets and can tolerate noise examples.
引用
收藏
页码:330 / +
页数:3
相关论文
共 9 条
[1]  
Aggarwal C. C., 2001, SIGMOD Record, V30, P37, DOI 10.1145/376284.375668
[2]   An effective and efficient algorithm for high-dimensional outlier detection [J].
Aggarwal, CC ;
Yu, PS .
VLDB JOURNAL, 2005, 14 (02) :211-221
[3]  
Beyer Kevin., 1999, INT C DATABASE THEOR, P217, DOI [DOI 10.1007/3-540-49257-7_15, 10.1007/3-540-49257-7_15]
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]  
Goldberg D.E, 1989, GENETIC ALGORITHMS S
[6]  
LI Y, 2007, P P 3 IEEE INT WORKS
[7]  
Ng R.-T, 1988, P 24 INT C VER LARG, P392
[8]  
Zhu C, 2004, LECT NOTES ARTIF INT, V3056, P222
[9]  
ZHU C, 2005, IPSJ T DAT, V46, P120