Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:3
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
  • [21] PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
    Xia, Huiyu
    Huang, Wei
    Li, Ning
    Zhou, Jianzhong
    Zhang, Dongying
    SENSORS, 2019, 19 (15)
  • [22] Clustering Algorithm Optimization Applied to Metagenomics Using Big Data
    Vanegas, Julian
    Bonet, Isis
    INFORMATION AND COMMUNICATION TECHNOLOGIES OF ECUADOR (TIC.EC), 2019, 884 : 182 - 192
  • [23] Review on the Research of K-means Clustering Algorithm in Big Data
    Chen Jie
    Zhang Jiyue
    Wu Junhui
    Wu Yusheng
    Si Huiping
    Lin Kaiyan
    2020 IEEE THE 3RD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE), 2020, : 107 - 111
  • [24] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [25] TURNING BIG DATA INTO TINY DATA: CONSTANT-SIZE CORESETS FOR k-MEANS, PCA, AND PROJECTIVE CLUSTERING
    Feldman, Dan
    Schmidt, Melanie
    Sohler, Christian
    SIAM JOURNAL ON COMPUTING, 2020, 49 (03) : 601 - 657
  • [26] Canopy with k-means Clustering Algorithm for Big Data Analytics
    Sagheer, Noor S.
    Yousif, Suhad A.
    FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
  • [27] K-MEANS plus : A DEVELOPED CLUSTERING ALGORITHM FOR BIG DATA
    Niu, Kun
    Gao, Zhipeng
    Jiao, Haizhen
    Deng, Nanjie
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 141 - 144
  • [28] Research on parallel association rule mining of big data based on an improved K-means clustering algorithm
    Hao, Li
    Wang, Tuanbu
    Guo, Chaoping
    INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2023, 16 (03) : 233 - 247
  • [29] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [30] A Novel K-Means based Clustering Algorithm for Big Data
    Sinha, Ankita
    Jana, Prasanta K.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879