Detecting outliers in categorical data through rough clustering

被引:14
作者
Suri, N. N. R. Ranga [1 ]
Murty, M. Narasimha [2 ]
Athithan, G. [3 ]
机构
[1] Ctr AI & Robot, Bangalore, Karnataka, India
[2] Indian Inst Sci, Dept CSA, Bangalore, Karnataka, India
[3] Sci Anal Grp, Metcalfe House, Delhi, India
关键词
Data mining; Outlier detection; Soft computing; Rough sets; Data clustering; Categorical data; SET; ALGORITHM;
D O I
10.1007/s11047-015-9489-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard -modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough -modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.
引用
收藏
页码:385 / 394
页数:10
相关论文
共 32 条
[1]   Rough Sets, Kernel Set, and Spatiotemporal Outlier Detection [J].
Albanese, Alessia ;
Pal, Sankar K. ;
Petrosino, Alfredo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) :194-207
[2]   Rough set based incremental clustering of interval data [J].
Asharaf, S ;
Murty, MN ;
Shevade, SK .
PATTERN RECOGNITION LETTERS, 2006, 27 (06) :515-519
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]  
Bock HH, 2002, ANAL SYMBOLIC DATA, P139
[5]   A new initialization method for categorical data clustering [J].
Cao, Fuyuan ;
Liang, Jiye ;
Bai, Liang .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10223-10228
[6]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[7]  
Huang Z., 1997, Dmkd, V3, P34
[8]   Rough sets and fuzzy sets in natural computing Preface [J].
Hung Son Nguyen ;
Pal, Sankar K. ;
Skowron, Andrzej .
THEORETICAL COMPUTER SCIENCE, 2011, 412 (42) :5816-5819
[9]   Data clustering: 50 years beyond K-means [J].
Jain, Anil K. .
PATTERN RECOGNITION LETTERS, 2010, 31 (08) :651-666
[10]   Some issues about outlier detection in rough set theory [J].
Jiang, Feng ;
Sui, Yuefei ;
Cao, Cungen .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4680-4687