Detecting Outlying Properties of Exceptional Objects

被引:50
作者
Angiulli, Fabrizio [1 ]
Fassetti, Fabio [2 ]
Palopoli, Luigi [1 ]
机构
[1] Univ Calabria, DIES, I-87036 Arcavacata Di Rende, CS, Italy
[2] ICAR CNR, I-87036 Arcavacata Di Rende, CS, Italy
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2009年 / 34卷 / 01期
关键词
Algorithms; Management; Data mining; knowledge discovery; outlier characterization; HIGH-DIMENSIONAL DATA; ALGORITHMS; COMPLEXITY;
D O I
10.1145/1508857.1508864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the ( a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a "property" of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.
引用
收藏
页数:62
相关论文
共 39 条
[1]  
Agarwal R., 1994, VLDB, V487, P499, DOI DOI 10.5555/645920.672836
[2]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[3]  
Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
[4]  
[Anonymous], 1998, UCI REPOSITORY MACHI
[5]  
[Anonymous], 1979, COMPUT INTRACTABILIT
[6]  
[Anonymous], P 2 ACM SIGKDD INT C
[7]  
Barnett V., 1994, Wiley series in probability and mathematical statistics applied probability and statistics, P224
[8]  
Breiman L., 1984, BIOMETRICS, V40, P874, DOI 10.1201/9781315139470
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]  
Chaudhuri S., 1997, SIGMOD Record, V26, P65, DOI 10.1145/248603.248616