K-modestream algorithm for clustering categorical data streams

被引:0
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
[1]   Clustering categorical data streams [J].
He, Zengyou ;
Xu, Xiaofei ;
Deng, Shengchun ;
Huang, Joshua Zhexue .
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) :185-192
[2]   A fuzzy k-modes algorithm for clustering categorical data [J].
Huang, ZX ;
Ng, MK .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) :446-452
[3]   K-distributions: A new algorithm for clustering categorical data [J].
Cai, Zhihua ;
Wang, Dianhong ;
Jiang, Liangxiao .
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2007, 4682 :436-443
[4]   A Global K-modes Algorithm for Clustering Categorical Data [J].
Bai Tian ;
Kulikowski, C. A. ;
Gong Leiguang ;
Yang Bin ;
Huang Lan ;
Zhou Chunguang .
CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03) :460-465
[5]   A k-populations algorithm for clustering categorical data [J].
Kim, DW ;
Lee, K ;
Lee, D ;
Lee, KH .
PATTERN RECOGNITION, 2005, 38 (07) :1131-1134
[6]   k-ANMI:: A mutual information based clustering algorithm for categorical data [J].
He, Zengyou ;
Xu, Xiaofei ;
Deng, Shengchun .
INFORMATION FUSION, 2008, 9 (02) :223-233
[7]   An improved k-prototypes clustering algorithm for mixed numeric and categorical data [J].
Ji, Jinchao ;
Bai, Tian ;
Zhou, Chunguang ;
Ma, Chao ;
Wang, Zhe .
NEUROCOMPUTING, 2013, 120 :590-596
[8]   A modified K-means algorithm for categorical data clustering [J].
Sun, Y ;
Zhu, QM ;
Chen, ZX .
IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, :31-37
[9]   Squeezer: An efficient algorithm for clustering categorical data [J].
He, ZY ;
Xu, XF ;
Deng, SC .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) :611-624
[10]   Coercion: A Distributed Clustering Algorithm for Categorical Data [J].
Wang, Bin ;
Zhou, Yang ;
Hei, Xinhong .
2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, :683-687