MEASUREMENT OF SIMILARITY USING LINK BASED CLUSTER APPROACH FOR CATEGORICAL DATA

被引:0
作者
Pavithra, M. [1 ]
Chandrakala, D. [1 ]
机构
[1] Kumaraguru Coll Technol, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
来源
2013 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES) | 2013年
关键词
Clustering; Data mining; Categorical data; Cluster Ensemble; link-based similarity; refined matrix; and C-Rank link based cluster; CONSENSUS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. The problem of clustering categorical data is to find a new partition in dataset to overcome the problem of clustering categorical data via cluster ensembles, result is observed that these techniques unluckily generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. This problem degrades the quality of the clustering result. To improve clustering quality a new link-based approach the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble and an efficient link-based algorithm is proposed for the underlying similarity assessment. In this paper propose C-Rank link-based algorithm improve clustering quality and ranking clusters in weighted networks. C-Rank consists of three major phases: (1) identification of candidate clusters; (2) ranking the candidates by integrated cohesion; and (3) elimination of non-maximal clusters. The finally apply this clustering result in graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix.
引用
收藏
页码:508 / 516
页数:9
相关论文
共 16 条
[1]  
Abdu E., 2009, P WORKSH DAT MIN US, P1
[2]  
[Anonymous], SUPERVISED UNSUPERVI
[3]  
[Anonymous], 2007, Uci machine learning repository
[4]  
Blake C. L., 1998, Uci repository of machine learning databases
[5]  
BOULIS C, 2004, P EUR C PRINC PRACT, V3202, P63
[6]   QROCK: A quick version of the ROCK algorithm for clustering of categorical data [J].
Dutta, M ;
Mahanta, AK ;
Pujari, AK .
PATTERN RECOGNITION LETTERS, 2005, 26 (15) :2364-2373
[7]  
Iam-On Natthakan, 2012, IEEE T KNOWLEDGE DAT, V24
[8]  
Law MHC, 2004, PROC CVPR IEEE, P424
[9]   Reinterpreting the category utility function [J].
Mirkin, B .
MACHINE LEARNING, 2001, 45 (02) :219-228
[10]   Consensus clusterings [J].
Nguyen, Nam ;
Caruana, Rich .
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, :607-612