K-modestream algorithm for clustering categorical data streams

被引:0
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
[41]   Incremental entropy-based clustering on categorical data streams with concept drift [J].
Li, Yanhong ;
Li, Deyu ;
Wang, Suge ;
Zhai, Yanhui .
KNOWLEDGE-BASED SYSTEMS, 2014, 59 :33-47
[42]   Empirical Analysis and Improvement of Density Based Clustering Algorithm in Data Streams [J].
Shukla, Madhu ;
Kosta, Y. P. .
2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015,
[43]   A clustering algorithm for multiple data streams based on spectral component similarity [J].
Chen Ling ;
Zou Ling-Jun ;
Tu Li .
INFORMATION SCIENCES, 2012, 183 (01) :35-47
[44]   Empirical Analysis and Improvement of Density Based Clustering Algorithm in Data Streams [J].
Shukla, Madhu ;
Kosta, Y. P. .
2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 1, 2016, :215-218
[45]   A Clustering Algorithm for Multiple Data Streams Based on Spectral Component Similarity [J].
Zou Lingjun ;
Chen Ling ;
Tu Ii .
ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, :595-603
[46]   Incremental Clustering for Categorical Data Using Clustering Ensemble [J].
Li Taoying ;
Chne Yan ;
Qu Lili ;
Mu Xiangwei .
PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, :2519-2524
[47]   Clustering Irregular Data Streams With Fuzzy Induction [J].
Sangma, Jerry W. ;
Pal, Vipin ;
Yogita, Yogita ;
Singal, Gaurav ;
Das, Swagatam .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025,
[48]   Clustering Categorical Data Based on Representatives [J].
Aranganayagi, S. ;
Thangavel, K. .
THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, :599-+
[49]   A method for k-means-like clustering of categorical data [J].
Nguyen T.-H.T. ;
Dinh D.-T. ;
Sriboonchitta S. ;
Huynh V.-N. .
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) :15011-15021
[50]   Probabilistic k-Median Clustering in Data Streams [J].
Christiane Lammersen ;
Melanie Schmidt ;
Christian Sohler .
Theory of Computing Systems, 2015, 56 :251-290