Fast Density Clustering Algorithm for Numerical Data and Categorical Data

被引:9
作者
Chen Jinyin [1 ]
He Huihao [1 ]
Chen Jungan [2 ]
Yu Shanqing [1 ]
Shi Zhaoxia [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310023, Zhejiang, Peoples R China
[2] Ningbo Wanli Univ, Dept Elect Engn, Ningbo 310023, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
MIXED DATA;
D O I
10.1155/2017/6393652
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data objects with mixed numerical and categorical attributes are often dealt with in the real world. Most existing algorithms have limitations such as low clustering quality, cluster center determination difficulty, and initial parameter sensibility. A fast density clustering algorithm (FDCA) is put forward based on one-time scan with cluster centers automatically determined by center set algorithm (CSA). A novel data similarity metric is designed for clustering data including numerical attributes and categorical attributes. CSA is designed to choose cluster centers from data object automatically which overcome the cluster centers setting difficulty in most clustering algorithms. The performance of the proposed method is verified through a series of experiments on ten mixed data sets in comparison with several other clustering algorithms in terms of the clustering purity, the efficiency, and the time complexity.
引用
收藏
页数:15
相关论文
共 31 条
[1]   A k-mean clustering algorithm for mixed numeric and categorical data [J].
Ahmad, Amir ;
Dey, Lipika .
DATA & KNOWLEDGE ENGINEERING, 2007, 63 (02) :503-527
[2]  
[Anonymous], 2010, P IEEE C EV COMP
[3]  
[Anonymous], 1996, P AAAI INT C KNOWL D
[4]  
Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
[5]   A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional [J].
Chatzis, Sotirios P. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8684-8689
[6]   A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data [J].
Chen, Jin-Yin ;
He, Hui-Hao .
INFORMATION SCIENCES, 2016, 345 :271-293
[7]  
[陈晋音 Chen Jinyin], 2015, [自动化学报, Acta Automatica Sinica], V41, P1798
[8]   Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number [J].
Cheung, Yiu-ming ;
Jia, Hong .
PATTERN RECOGNITION, 2013, 46 (08) :2228-2238
[9]   SpectralCAT: Categorical spectral clustering of numerical and nominal data [J].
David, Gil ;
Averbuch, Amir .
PATTERN RECOGNITION, 2012, 45 (01) :416-433
[10]  
Everitt B. S., 2001, CLUSTER ANAL