A Global K-modes Algorithm for Clustering Categorical Data

被引:0
作者
Bai Tian [1 ,2 ]
Kulikowski, C. A. [2 ]
Gong Leiguang [3 ]
Yang Bin [1 ]
Huang Lan [1 ]
Zhou Chunguang [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08903 USA
[3] IBM Thomas J Watson Res Ctr, Hawthorne, NJ USA
来源
CHINESE JOURNAL OF ELECTRONICS | 2012年 / 21卷 / 03期
基金
中国国家自然科学基金;
关键词
Categorical data; Clustering; Data mining; K-modes algorithm;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a new Global k-modes (GKM) algorithm is proposed for clustering categorical data. The new method randomly selects a sufficiently large number of initial modes to account for the global distribution of the data set, and then progressively eliminates the redundant modes using an iterative optimization process with an elimination criterion function. Systematic experiments were carried out with data from the UCI Machine learning repository. The results and a comparative evaluation show a high performance and consistency of the proposed method, which achieves significant improvement compared to other well-known k-modes-type algorithms in terms of clustering accuracy.
引用
收藏
页码:460 / 465
页数:6
相关论文
共 17 条
[1]   An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin .
KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) :785-795
[3]   A new initialization method for categorical data clustering [J].
Cao, Fuyuan ;
Liang, Jiye ;
Bai, Liang .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) :10223-10228
[4]   K-modes clustering [J].
Chaturvedi, A ;
Green, PE ;
Carroll, JD .
JOURNAL OF CLASSIFICATION, 2001, 18 (01) :35-55
[5]  
He Zengyou, 2006, ARXIVCS0610043V1
[6]  
He ZY, 2005, LECT NOTES ARTIF INT, V3801, P157
[7]   Extensions to the k-means algorithm for clustering large data sets with categorical values [J].
Huang, ZX .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) :283-304
[8]   A fuzzy k-modes algorithm for clustering categorical data [J].
Huang, ZX ;
Ng, MK .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) :446-452
[9]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[10]  
Liu CA, 2011, CHINESE J ELECTRON, V20, P414