Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:3
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
[41]   Balancing effort and benefit of K-means clustering algorithms in Big Data realms [J].
Perez-Ortega, Joaquin ;
Nely Almanza-Ortega, Nelva ;
Romero, David .
PLOS ONE, 2018, 13 (09)
[42]   Study on oceanic big data clustering based on incremental K-means algorithm [J].
Li Y. ;
Yang Z. ;
Han K. .
International Journal of Innovative Computing and Applications, 2020, 11 (2-3) :89-95
[43]   Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms [J].
Perez-Ortega, Joaquin ;
Silvia Roblero-Aguilar, Sandra ;
Nely Almanza-Ortega, Nelva ;
Frausto Solis, Juan ;
Zavala-Diaz, Crispin ;
Hernandez, Yasmin ;
Landero-Najera, Vanesa .
AXIOMS, 2022, 11 (08)
[44]   Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering [J].
Abualigah, Laith ;
Gandomi, Amir H. ;
Elaziz, Mohamed Abd ;
Hamad, Husam Al ;
Omari, Mahmoud ;
Alshinwan, Mohammad ;
Khasawneh, Ahmad M. .
ELECTRONICS, 2021, 10 (02) :1-29
[45]   ACOCA: Ant Colony Optimization Based Clustering Algorithm for Big Data Preprocessing [J].
Singh, Neelam ;
Singh, Devesh Pratap ;
Pant, Bhasker .
INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2019, 4 (05) :1239-1250
[46]   Optimization using Artificial Bee Colony based clustering approach for big data [J].
S. Sudhakar Ilango ;
S. Vimal ;
M. Kaliappan ;
P. Subbulakshmi .
Cluster Computing, 2019, 22 :12169-12177
[47]   Optimization using Artificial Bee Colony based clustering approach for big data [J].
Ilango, S. Sudhakar ;
Vimal, S. ;
Kaliappan, M. ;
Subbulakshmi, P. .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5) :12169-12177
[48]   The application of parallel clustering analysis based on big data mining in physical community discovery [J].
Wu, Fan ;
Zhou, Rui .
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (SUPPL 3) :1054-1062
[49]   Randomized Block Proximal Methods for Distributed Stochastic Big-Data Optimization [J].
Farina, Francesco ;
Notarstefano, Giuseppe .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (09) :4000-4014
[50]   Density estimation-based method to determine sample size for random sample partition of big data [J].
Yulin He ;
Jiaqi Chen ;
Jiaxing Shen ;
Philippe Fournier-Viger ;
Joshua Zhexue Huang .
Frontiers of Computer Science, 2024, 18