Mining outlying aspects on numeric data

被引:51
作者
Duan, Lei [1 ]
Tang, Guanting [2 ]
Pei, Jian [2 ]
Bailey, James [3 ]
Campbell, Akiko [4 ]
Tang, Changjie [1 ]
机构
[1] Sichuan Univ, Sch Comp Sci, Chengdu 610064, Sichuan, Peoples R China
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic, Australia
[4] Pacific Blue Cross, Burnaby, BC, Canada
基金
澳大利亚研究理事会; 中国博士后科学基金; 加拿大自然科学与工程研究理事会;
关键词
Outlying aspect; Outlyingness degree; Kernel density estimation; Subspace search; OUTLIER DETECTION;
D O I
10.1007/s10618-014-0398-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data. Given a query object in a multidimensional numeric data set , in which subspace is most outlying? Technically, we use the rank of the probability density of an object in a subspace to measure the outlyingness of the object in the subspace. A minimal subspace where the query object is ranked the best is an outlying aspect. Computing the outlying aspects of a query object is far from trivial. A na < ve method has to calculate the probability densities of all objects and rank them in every subspace, which is very costly when the dimensionality is high. We systematically develop a heuristic method that is capable of searching data sets with tens of dimensions efficiently. Our empirical study using both real data and synthetic data demonstrates that our method is effective and efficient.
引用
收藏
页码:1116 / 1151
页数:36
相关论文
共 28 条
[1]  
Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
[2]  
Aggarwal CC, 2013, INTRO OUTLIER ANAL
[3]  
Agrawal R., 1994, P 20 INT C VER LARG
[4]  
Angiulli F, 2013, ABS13063558 CORR
[5]   Detecting Outlying Properties of Exceptional Objects [J].
Angiulli, Fabrizio ;
Fassetti, Fabio ;
Palopoli, Luigi .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2009, 34 (01)
[6]  
Bache K, 2013, UCI machine learning repository
[7]  
Bhaduri K., 2011, Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, P859, DOI DOI 10.1145/2020408.2020554
[8]  
Bohm K., 2013, P 2013 SIAM INT C DA, P198
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]  
Han J, 2011, DATA MINING CONCEPTS