K-modestream algorithm for clustering categorical data streams

被引:0
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
[21]   A Hybrid Clustering Algorithm for Outlier Detection in Data Streams [J].
Vijayarani, S. ;
Jothi, P. .
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (11) :285-295
[22]   A clustering algorithm for multivariate data streams with correlated components [J].
Aletti G. ;
Micheletti A. .
Journal of Big Data, 2017, 4 (01)
[23]   Improved Clustering for Categorical Data with Genetic Algorithm [J].
Sharma, Abha ;
Thakur, R. S. .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 :67-76
[24]   Extensions to the k-means algorithm for clustering large data sets with categorical values [J].
Huang, ZX .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) :283-304
[25]   Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values [J].
Zhexue Huang .
Data Mining and Knowledge Discovery, 1998, 2 :283-304
[26]   Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes [J].
Mehta, Darshan ;
Tripathy, B. K. .
PROCEEDINGS OF SIXTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2016), VOL 1, 2017, 546 :254-263
[27]   Adapting K-Means Algorithm for Pair-Wise Constrained Clustering of Imbalanced Data Streams [J].
Wojciechowski, Szymon ;
Gonzalez-Almagro, German ;
Garcia, Salvador ;
Wozniak, Michal .
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2022, 2022, 13469 :153-163
[29]   An evolutionary algorithm for clustering data streams with a variable number of clusters [J].
Silva, Jonathan de Andrade ;
Hruschka, Eduardo Raul ;
Gama, Joao .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 :228-238
[30]   Online clustering of parallel data streams [J].
Beringer, Juergen ;
Huellermeier, Eyke .
DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) :180-204