P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets

被引:11
作者
Altameem, Ayman [1 ]
Poonia, Ramesh Chandra [2 ]
Kumar, Ankit [3 ]
Raja, Linesh [4 ]
Saudagar, Abdul Khader Jilani [5 ]
机构
[1] King Saud Univ, Dept Comp Sci & Engn, Coll Appl Studies & Community Serv, Riyadh 11533, Saudi Arabia
[2] CHRIST Univ, Dept Comp Sci, Bangalore 560029, Karnataka, India
[3] GLA Univ, Dept Comp Engn & Applicat, Mathura, Uttar Pradesh, India
[4] Manipal Univ Jaipur, Dept Comp Applicat, Jaipur 303007, Rajasthan, India
[5] Imam Mohammad Ibn Saud Islamic Univ IMSIU, Dept Informat Syst, Riyadh 11432, Saudi Arabia
关键词
ROCK; K-means algorithm; clustering approaches; unsupervised learning; K-histogram;
D O I
10.32604/iasc.2023.027579
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data clustering is crucial when it comes to data processing and analytics. The new clustering method overcomes the challenge of evaluating and extracting data from big data. Numerical or categorical data can be grouped. Existing clustering methods favor numerical data clustering and ignore categorical data clustering. Until recently, the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods. However, these algorithms could not use the concept of categorical data for clustering. Following that, suggestions for expanding traditional categorical data processing methods were made. In addition to expansions, several new clustering methods and extensions have been proposed in recent years. ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them. This paper aims to modify the algorithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures. The parameterized ROCK algorithm is the name given to the modified algorithm (P-ROCK). The proposed modification makes the original algorithm more flexible by using user-defined parameters. A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm. A comparison with the original ROCK algorithm is also provided. Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%. The proposed P-ROCK algorithm has improved the runtime and is more flexible and scalable.
引用
收藏
页码:553 / 566
页数:14
相关论文
共 28 条
[1]   A Novel Clustering Index to Find Optimal Clusters Size With Application to Segmentation of Energy Consumers [J].
Al Khafaf, Nameer ;
Jalili, Mahdi ;
Sokolowski, Peter .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (01) :346-355
[2]   Dynamic Cluster Formation Game for Attributed Graph Clustering [J].
Bu, Zhan ;
Li, Hui-Jia ;
Cao, Jie ;
Wang, Zhen ;
Gao, Guangliang .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (01) :328-341
[3]   Exploring Correlations Among Tasks, Clusters, and Features for Multitask Clustering [J].
Cao, Wenming ;
Wu, Si ;
Yu, Zhiwen ;
Wong, Hau-San .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (02) :355-368
[4]   Subspace Weighting Co-Clustering of Gene Expression Data [J].
Chen, Xiaojun ;
Huang, Joshua Z. ;
Wu, Qingyao ;
Yang, Min .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) :352-364
[5]   Coherent Clustering Method Based on Weighted Clustering of Multi-Indicator Panel Data [J].
Chen, Yanbo ;
Zhang, Zhi ;
Song, Xinfu ;
Liu, Jianqin ;
Hou, Mengxi ;
Li, Gaowang ;
Xu, Weiting ;
Ma, Jin .
IEEE ACCESS, 2019, 7 :43462-43472
[6]   Intra-Cluster Distance Minimization in DNA Methylation Analysis Using an Advanced Tabu-Based Iterative k-Medoids Clustering Algorithm (T-CLUST) [J].
Damgacioglu, Haluk ;
Celik, Emrah ;
Celik, Nurcin .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (04) :1241-1252
[7]   Sunyaev-Zel'dovich effect and X-ray scaling relations from weak lensing mass calibration of 32 South Pole Telescope selected galaxy clusters [J].
Dietrich, J. P. ;
Bocquet, S. ;
Schrabback, T. ;
Applegate, D. ;
Hoekstra, H. ;
Grandis, S. ;
Mohr, J. J. ;
Allen, S. W. ;
Bayliss, M. B. ;
Benson, B. A. ;
Bleem, L. E. ;
Brodwin, M. ;
Bulbul, E. ;
Capasso, R. ;
Chiu, I ;
Crawford, T. M. ;
Gonzalez, A. H. ;
de Haan, T. ;
Klein, M. ;
von der Linden, A. ;
Mantz, A. B. ;
Marrone, D. P. ;
McDonald, M. ;
Raghunathan, S. ;
Rapetti, D. ;
Reichardt, C. L. ;
Saro, A. ;
Stalder, B. ;
Stark, A. ;
Stern, C. ;
Stubbs, C. .
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2019, 483 (03) :2871-2906
[8]   Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams [J].
Fahy, Conor ;
Yang, Shengxiang ;
Gongora, Mario .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (06) :2215-2228
[9]   Rock mass classification prediction model using heuristic algorithms and support vector machines: a case study of Chambishi copper mine [J].
Hu, Jianhua ;
Zhou, Tan ;
Ma, Shaowei ;
Yang, Dongjie ;
Guo, Mengmeng ;
Huang, Pengli .
SCIENTIFIC REPORTS, 2022, 12 (01)
[10]   Multi-Task Image Clustering through Correlation Propagation [J].
Hu, Shizhe ;
Yan, Xiaoqiang ;
Ye, Yangdong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) :1113-1127