An efficient clustering algorithm for mixed type attributes in large dataset

被引:0
作者
Yin, R [1 ]
Tan, ZF [1 ]
Ren, JT [1 ]
Chen, YQ [1 ]
机构
[1] Zhongshan Univ, Dept Comp Sci, Guangzhou 510275, Peoples R China
来源
PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9 | 2005年
关键词
data mining; clustering; CF*-tree; k-prototype;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a widely used technique in data mining, at present there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this article, we will propose a clustering algorithm that can handle large datasets with mixed type of attributes. We first use CF*-tree (just like CF-tree in BIRCH) to pre-cluster datasets. After that the dense regions are stored in leaf nodes, then we look every dense region as a single point and use the ameliorated k-prototype to cluster such dense regions. Experiment shows that this algorithm is very efficient in clustering large datasets with mixed type of attributes.
引用
收藏
页码:1611 / 1614
页数:4
相关论文
共 7 条
[1]  
CHEN PJ, 2004, COMPUTER ENG APPL, P190
[2]  
Chiu t., 2001, Proceedings of the 7th ACM SIGKDD, P263, DOI [DOI 10.1145/502512.502549, 10.1145/502512.502549]
[3]  
Ester M., 1996, P 1996 INT C KNOWL D, P266
[4]   Extensions to the k-means algorithm for clustering large data sets with categorical values [J].
Huang, ZX .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) :283-304
[5]  
MACQUEEN J, 1967, P 5 BERK S MATH STAT, V1, P128
[6]  
Ng R.T., 1994, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, P144
[7]  
Zhang T., 1996, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, V25, P103, DOI [10.1145/235968.233324, /10.1145/235968.233324, DOI 10.1145/235968.233324]