Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:3
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
[21]   PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data [J].
Xia, Huiyu ;
Huang, Wei ;
Li, Ning ;
Zhou, Jianzhong ;
Zhang, Dongying .
SENSORS, 2019, 19 (15)
[22]   TURNING BIG DATA INTO TINY DATA: CONSTANT-SIZE CORESETS FOR k-MEANS, PCA, AND PROJECTIVE CLUSTERING [J].
Feldman, Dan ;
Schmidt, Melanie ;
Sohler, Christian .
SIAM JOURNAL ON COMPUTING, 2020, 49 (03) :601-657
[23]   The fast clustering algorithm for the big data based on K-means [J].
Xie, Ting ;
Zhang, Taiping .
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
[24]   Review on the Research of K-means Clustering Algorithm in Big Data [J].
Chen Jie ;
Zhang Jiyue ;
Wu Junhui ;
Wu Yusheng ;
Si Huiping ;
Lin Kaiyan .
2020 IEEE THE 3RD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE), 2020, :107-111
[25]   Canopy with k-means Clustering Algorithm for Big Data Analytics [J].
Sagheer, Noor S. ;
Yousif, Suhad A. .
FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
[26]   K-MEANS plus : A DEVELOPED CLUSTERING ALGORITHM FOR BIG DATA [J].
Niu, Kun ;
Gao, Zhipeng ;
Jiao, Haizhen ;
Deng, Nanjie .
PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, :141-144
[27]   Research on parallel association rule mining of big data based on an improved K-means clustering algorithm [J].
Hao, Li ;
Wang, Tuanbu ;
Guo, Chaoping .
INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2023, 16 (03) :233-247
[28]   A Novel K-Means based Clustering Algorithm for Big Data [J].
Sinha, Ankita ;
Jana, Prasanta K. .
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, :1875-1879
[29]   Optimized big data K-means clustering using MapReduce [J].
Cui, Xiaoli ;
Zhu, Pingfei ;
Yang, Xin ;
Li, Keqiu ;
Ji, Changqing .
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03) :1249-1259
[30]   Clustering Algorithm Optimization Applied to Metagenomics Using Big Data [J].
Vanegas, Julian ;
Bonet, Isis .
INFORMATION AND COMMUNICATION TECHNOLOGIES OF ECUADOR (TIC.EC), 2019, 884 :182-192