P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets

被引:11
作者
Altameem, Ayman [1 ]
Poonia, Ramesh Chandra [2 ]
Kumar, Ankit [3 ]
Raja, Linesh [4 ]
Saudagar, Abdul Khader Jilani [5 ]
机构
[1] King Saud Univ, Dept Comp Sci & Engn, Coll Appl Studies & Community Serv, Riyadh 11533, Saudi Arabia
[2] CHRIST Univ, Dept Comp Sci, Bangalore 560029, Karnataka, India
[3] GLA Univ, Dept Comp Engn & Applicat, Mathura, Uttar Pradesh, India
[4] Manipal Univ Jaipur, Dept Comp Applicat, Jaipur 303007, Rajasthan, India
[5] Imam Mohammad Ibn Saud Islamic Univ IMSIU, Dept Informat Syst, Riyadh 11432, Saudi Arabia
关键词
ROCK; K-means algorithm; clustering approaches; unsupervised learning; K-histogram;
D O I
10.32604/iasc.2023.027579
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data clustering is crucial when it comes to data processing and analytics. The new clustering method overcomes the challenge of evaluating and extracting data from big data. Numerical or categorical data can be grouped. Existing clustering methods favor numerical data clustering and ignore categorical data clustering. Until recently, the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods. However, these algorithms could not use the concept of categorical data for clustering. Following that, suggestions for expanding traditional categorical data processing methods were made. In addition to expansions, several new clustering methods and extensions have been proposed in recent years. ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them. This paper aims to modify the algorithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures. The parameterized ROCK algorithm is the name given to the modified algorithm (P-ROCK). The proposed modification makes the original algorithm more flexible by using user-defined parameters. A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm. A comparison with the original ROCK algorithm is also provided. Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%. The proposed P-ROCK algorithm has improved the runtime and is more flexible and scalable.
引用
收藏
页码:553 / 566
页数:14
相关论文
共 28 条
[11]   Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters [J].
Jia, Hong ;
Cheung, Yiu-Ming .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) :3308-3325
[12]   Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters [J].
Khan, Imran ;
Luo, Zongwei ;
Huang, Joshua Zhexue ;
Shahzad, Waseem .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) :1838-1853
[13]   A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem [J].
Kumar, Sunil ;
Singh, Maninder .
BIG DATA MINING AND ANALYTICS, 2019, 2 (04) :240-247
[14]   Robust Rock Detection and Clustering with Surface Analysis for Robotic Rock Breaking Systems [J].
Lampinen, Santeri ;
Mattila, Jouni .
2021 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2021, :140-147
[15]   Meta-Heuristic Optimization-Based Two-Stage Residential Load Pattern Clustering Approach Considering Intra-Cluster Compactness and Inter-Cluster Separation [J].
Li, Kangping ;
Cao, Xin ;
Ge, Xinxin ;
Wang, Fei ;
Lu, Xiaoxing ;
Shi, Min ;
Yin, Rui ;
Mi, Zengqiang ;
Chang, Shengqiang .
IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2020, 56 (04) :3375-3384
[16]   A New Cluster Validity Index Based on the Adjustment of Within-Cluster Distance [J].
Li, Qi ;
Yue, Shihong ;
Wang, Yaru ;
Ding, Mingliang ;
Li, Jia .
IEEE ACCESS, 2020, 8 :202872-202885
[17]   Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration [J].
Li, Yanxiong ;
Liu, Mingle ;
Wang, Wucheng ;
Zhang, Yuhan ;
He, Qianhua .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (06) :1385-1394
[18]   A Novel Algorithm for Initial Cluster Center Selection [J].
Li, Yating ;
Cai, Jianghui ;
Yang, Haifeng ;
Zhang, Jifu ;
Zhao, Xujun .
IEEE ACCESS, 2019, 7 :74683-74693
[19]   Simultaneous Subspace Clustering and Cluster Number Estimating Based on Triplet Relationship [J].
Liang, Jie ;
Yang, Jufeng ;
Cheng, Ming-Ming ;
Rosin, Paul L. ;
Wang, Liang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) :3973-3985
[20]   Subspace Clustering Without Knowing the Number of Clusters: A Parameter Free Approach [J].
Menon, Vishnu ;
Muthukrishnan, Gokularam ;
Kalyani, Sheetal .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 :5047-5062