Fast Density Clustering Algorithm for Numerical Data and Categorical Data

被引:9
作者
Chen Jinyin [1 ]
He Huihao [1 ]
Chen Jungan [2 ]
Yu Shanqing [1 ]
Shi Zhaoxia [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310023, Zhejiang, Peoples R China
[2] Ningbo Wanli Univ, Dept Elect Engn, Ningbo 310023, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
MIXED DATA;
D O I
10.1155/2017/6393652
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data objects with mixed numerical and categorical attributes are often dealt with in the real world. Most existing algorithms have limitations such as low clustering quality, cluster center determination difficulty, and initial parameter sensibility. A fast density clustering algorithm (FDCA) is put forward based on one-time scan with cluster centers automatically determined by center set algorithm (CSA). A novel data similarity metric is designed for clustering data including numerical attributes and categorical attributes. CSA is designed to choose cluster centers from data object automatically which overcome the cluster centers setting difficulty in most clustering algorithms. The performance of the proposed method is verified through a series of experiments on ten mixed data sets in comparison with several other clustering algorithms in terms of the clustering purity, the efficiency, and the time complexity.
引用
收藏
页数:15
相关论文
共 31 条
[31]  
Zhexue Huang, 1997, Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining. KDD: Techniques and Applications, P21