Comparison of internal clustering validation indices for prototype-based clustering

被引:65
作者
Hämäläinen J. [1 ]
Jauhiainen S. [1 ]
Kärkkäinen T. [1 ]
机构
[1] Faculty of Information Technology, University of Jyvaskyla, P.O. Box 35, Jyvaskyla
关键词
Clustering validation index; Prototype-based clustering; Robust statistics;
D O I
10.3390/a10030105
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on the behavior of different variants of clustering algorithms will be given. © 2017 by the authors.
引用
收藏
相关论文
共 51 条
[1]  
Jain A.K., Murty M.N., Flynn P.J., Data clustering: A review, ACM Comput. Surv, 31, pp. 264-323, (1999)
[2]  
Aggarwal C.C., Reddy C.K., Data Clustering: Algorithms and Applications, (2013)
[3]  
Xie X.L., Beni G., A validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell, 13, pp. 841-847, (1991)
[4]  
Jain A.K., Data clustering: 50 years beyond K-means, Pattern Recognit. Lett, 31, pp. 651-666, (2010)
[5]  
Zaki M.J., Meira W., Data Mining and Analysis: Fundamental Concepts and Algorithms, (2014)
[6]  
Saarela M., Hamalainen J., Karkkainen T., Feature Ranking of Large, Robust, and Weighted Clustering Result, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 96-109, (2017)
[7]  
Lloyd S., Least squares quantization in PCM, IEEE Trans. Inf. Theory, 28, pp. 129-137, (1982)
[8]  
Khan S.S., Ahmad A., Cluster center initialization algorithm for K-modes clustering, Expert Syst. Appl, 40, pp. 7444-7456, (2013)
[9]  
Arthur D., Vassilvitskii S., K-means++: The advantages of careful seeding, Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035, (2007)
[10]  
Xu R., Wunsch D., Survey of clustering algorithms, IEEE Trans. Neural Netw, 16, pp. 645-678, (2005)