A similarity-based K-prototypes algorithm for mixed attributes

被引：0

作者：

Yang, Yang ^{[1
]}

Liu, Qian ^{[1
]}

Gao, Zhipeng ^{[1
]}

Qiu, Xuesong ^{[1
]}

Rui, Lanlan ^{[1
]}

机构：

[1] State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing

来源：

Journal of Computational Information Systems | 2015年 / 11卷 / 14期

关键词：

Clustering; Information Entropy; K-prototypes; Similarity;

D O I：

10.12733/jcis14512

中图分类号：

学科分类号：

摘要：

The aim of clustering is to partition a given set of similar data objects into homogeneous clusters. As the complexity of numerical and categorical attributes of datasets, K-prototypes algorithm is proposed to solve the mixed attributes in data mining. But it always uses Hemingway distance to measure the differences between the two categorical attributes, which cannot entirely embody the sample difference when dealing with complex data sets. Based on this, we calculate the similarity for numerical attributes based on information entropy mechanism, and introduce the similarity between the sample objects and other samples in the same cluster for categorical attributes. Simulation results show that, compared with traditional algorithm, our algorithm has certain promotional effects on stability and accuracy. ©, 2015, Journal of Computational Information Systems. All right reserved.

引用

页码：5013 / 5021

页数：8

共 20 条

[1]

Klosgen W., Zytkow J.M., Knowledge discovery in databases terminology, Advances in Knowledge Discovery and Data Mining, pp. 573-592, (1996)

[2]

Cormack R.M., A review of classification, J. Roy. Statist. Soc. Serie A, pp. 321-367, (1971)

[3]

Data Management Solutions, (1996)

[4]

Anderberg M.R., Cluster Analysis for Applications, (1973)

[5]

MacQueen J.B., Some methods for classification and analysis of multivariate observations, Proc. 5th Symp. Mathematical Statistics and Probability, pp. 281-297, (1967)

[6]

Huang Z., Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining Knowledge Discovery, 2, 3, pp. 283-304, (1998)

[7]

Huang Z., A Fuzzy K-modes algorithm for clustering categorical data, IEEE Transacitons on Fuzzy Systems, 7, 4, pp. 446-452, (1999)

[8]

Ball G.H., Hall D.J., A clustering technique for summarizing multivariate data, Behavioral Sci, pp. 153-155, (1967)

[9]

Jain A.K., Et al., Algorithms for Clustering Data, (1988)

[10]

Ruspini E.R., A new approach to clustering, Inform. Contr, pp. 22-32, (1969)

← 1 2 →