K-modestream algorithm for clustering categorical data streams

被引:0
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
[31]   Improving Multivariate Data Streams Clustering [J].
Bones, Christian C. ;
Romani, Luciana A. S. ;
de Sousa, Elaine P. M. .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 :461-471
[32]   Multiobjective clustering algorithm with fuzzy centroids for categorical data [J].
Zhou Z. ;
Zhu S. ;
Zhang D. .
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11) :2594-2606
[33]   Performances of parallel clustering algorithm for categorical and mixed data [J].
Hai, NTM ;
Susumu, H .
PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 :252-256
[34]   Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining [J].
Sengottaian, Sarumathi ;
Natesan, Shanthi ;
Mathivanan, Sharmila .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) :275-284
[35]   A new clustering algorithm for categorical attributes [J].
Lu, SF ;
Lu, ZD .
JOURNAL OF UNIVERSITY OF SCIENCE AND TECHNOLOGY BEIJING, 2000, 7 (04) :318-322
[36]   A new clustering algorithm for categorical attributes [J].
Tang, CB ;
Zhao, WD .
SHAPING BUSINESS STRATEGY IN A NETWORKED WORLD, VOLS 1 AND 2, PROCEEDINGS, 2004, :1065-1069
[37]   A Weight Entropy k-means Algorithm for Clustering Dataset with Mixed Numeric and Categorical Data [J].
Li, Taoying ;
Chen, Yan .
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2008, :36-41
[38]   k-CCM: A Center-Based Algorithm for Clustering Categorical Data with Missing Values [J].
Dinh, Duy-Tai ;
Huynh, Van-Nam .
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2018), 2018, 11144 :267-279
[39]   An efficient k-modes algorithm for clustering categorical datasets [J].
Dorman, Karin S. ;
Maitra, Ranjan .
STATISTICAL ANALYSIS AND DATA MINING-AN ASA DATA SCIENCE JOURNAL, 2022, 15 (01) :83-97
[40]   MGR: An information theory based hierarchical divisive clustering algorithm for categorical data [J].
Qin, Hongwu ;
Ma, Xiuqin ;
Herawan, Tutut ;
Zain, Jasni Mohamad .
KNOWLEDGE-BASED SYSTEMS, 2014, 67 :401-411