The Performance of Objective Functions for Clustering Categorical Data

被引:0
作者
Xiang, Zhengrong [1 ]
Islam, Md Zahidul [2 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Charles Sturt Univ, Sch Comp & Math, Bathurst, NSW 2795, Australia
来源
KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, PKAW 2014 | 2014年 / 8863卷
关键词
Objective Function; Clustering; Categorical data; Transfer algorithm; ALGORITHM; ATTRIBUTES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partitioning methods, such as k-means, are popular and useful for clustering. Recently we proposed a new partitioning method for clustering categorical data: using the transfer algorithm to optimize an objective function called within-cluster dispersion. Preliminary experimental results showed that this method outperforms a standard method called k-modes, in terms of the average quality of clustering results. In this paper, we make more advanced efforts to compare the performance of objective functions for categorical data. First we analytically compare the quality of three objective functions: k-medoids, k-modes and within-cluster dispersion. Secondly we measure how well these objectives find true structures in real data sets, by finding their global optima, which we argue is a better measurement than average clustering results. The conclusion is that within-cluster dispersion is generally a better objective for discovering cluster structures. Moreover, we evaluate the performance of various distance measures on within-cluster dispersion, and give some useful observations.
引用
收藏
页码:16 / 28
页数:13
相关论文
共 17 条
  • [1] [Anonymous], 2011, CLUSTER ANAL
  • [2] [Anonymous], 2009, INT J GEOMATH, DOI DOI 10.1007/S13137-020-00149-9
  • [3] Bache K., 2013, UCI Machine Learning Repository
  • [4] Banfield C. F., 1977, Applied Statistics, V26, P206, DOI 10.2307/2347039
  • [5] Boriah S., 2008, RED, V30
  • [6] K-modes clustering
    Chaturvedi, A
    Green, PE
    Carroll, JD
    [J]. JOURNAL OF CLASSIFICATION, 2001, 18 (01) : 35 - 55
  • [7] ROCK: A robust clustering algorithm for categorical attributes
    Guha, S
    Rastogi, R
    Shim, K
    [J]. 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 512 - 521
  • [8] Extensions to the k-means algorithm for clustering large data sets with categorical values
    Huang, ZX
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) : 283 - 304
  • [9] Discovering Multiple Clustering Solutions: Grouping Objects in Different Views of the Data
    Mueller, Emmanuel
    Guennemann, Stephan
    Faerber, Ines
    Seidl, Thomas
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1207 - 1210
  • [10] On the impact of dissimilarity measure in k-modes clustering algorithm
    Ng, Michael K.
    Li, Mark Junjie
    Huang, Joshua Zhexue
    He, Zengyou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 503 - 507