Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引:3
|
作者
Mussabayev, Rustam [1 ,2 ]
Mussabayev, Ravil [1 ,3 ]
机构
[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan
[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan
[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷
关键词
Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;
D O I
10.1007/978-981-97-4985-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
引用
收藏
页码:224 / 236
页数:13
相关论文
共 50 条
  • [1] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [2] A survey on parallel clustering algorithms for Big Data
    Dafir, Zineb
    Lamari, Yasmine
    Slaoui, Said Chah
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) : 2411 - 2443
  • [3] New Approach for Clustering of Big Data: DisK-Means
    Saini, Anu
    Minocha, Jagrit
    Ubriani, Jaypriya
    Sharma, Dhruv
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 122 - 126
  • [4] p-PIC: Parallel power iteration clustering for big data
    Yan, Weizhong
    Brahmakshatriya, Umang
    Xue, Ya
    Gilder, Mark
    Wise, Bowden
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 352 - 359
  • [5] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
    Bandyopadhyay, Soumyendu Sekhar
    Halder, Anup Kumar
    Chatterjee, Piyali
    Nasipuri, Mita
    Basu, Subhadip
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
  • [6] A survey on parallel clustering algorithms for Big Data
    Zineb Dafir
    Yasmine Lamari
    Said Chah Slaoui
    Artificial Intelligence Review, 2021, 54 : 2411 - 2443
  • [7] Sample Contribution Pattern Based Big Data Mining Optimization Algorithms
    Shi, Xiaodong
    Liu, Yang
    IEEE ACCESS, 2021, 9 : 32734 - 32746
  • [8] A Novel Clustering Algorithm for Big Data: K-Means-Fuzzy C Means
    Manikandan, A.
    Danapaquiame, N.
    Gayathri, R.
    Kodhai, E.
    Amudhavel, J.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 85 - 93
  • [9] Parallel Lasso Screening for Big Data Optimization
    Li, Qingyang
    Qiu, Shuang
    Ji, Shuiwang
    Thompson, Paul M.
    Ye, Jieping
    Wang, Jie
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1705 - 1714
  • [10] Discovery multiple data structures in Big Data through global optimization and clustering methods
    Bifulco, Ida
    Cirillo, Stefano
    2018 22ND INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2018, : 117 - 121