Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

被引：3

作者：

Mussabayev, Rustam ^{[1
,2
]}

Mussabayev, Ravil ^{[1
,3
]}

机构：

[1] Satbayev Univ, Satbayev St 22, Alma Ata 050013, Kazakhstan

[2] Inst Informat & Computat Technol, Lab Anal & Modeling Informat Proc, Pushkin St 125, Alma Ata 050010, Kazakhstan

[3] Univ Washington, Dept Math, Padelford Hall C-138, Seattle, WA 98195 USA

来源：

INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024 | 2024年 / 14796卷

关键词：

Big-means Clustering; Parallel Computing; Data Mining; Stochastic Variation; Sample Size; Competitive Environment; Parallelization Strategy; Machine Learning; Big Data Analysis; Optimization; Cluster Analysis; K-means; K-means plus; Unsupervised Learning;

D O I：

10.1007/978-981-97-4985-0_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.

引用

页码：224 / 236

页数：13

共 50 条

[31] Optimized big data K-means clustering using MapReduce [J].

Xiaoli Cui ;

Pingfei Zhu ;

Xin Yang ;

Keqiu Li ;

Changqing Ji .

The Journal of Supercomputing, 2014, 70 :1249-1259

[32] A GPU Based Parallel Clustering Method for Electric Power Big Data [J].

Ji, Cong ;

Xiong, Zheng ;

Fang, Chao ;

Lv, Hui ;

Zhang, Kaizhen .

2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, :29-33

[33] An Efficient Parallel Algorithm for Clustering Big Data based on the Spark Framework [J].

Dafir, Zineb ;

Slaoui, Said .

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) :890-896

[34] Delaunay Triangulation in the Big Data Landscape: A Parallel Optimization Approach [J].

Zhou, Shuqiang ;

Wang, Yankun .

Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)

[35] PARALLEL METHOD OF BIG DATA REDUCTION BASED ON STOCHASTIC PROGRAMMING APPROACH [J].

Oliinyk, A. ;

Subbotin, S. ;

Lovkin, V ;

Ilyashenko, M. ;

Blagodariov, O. .

RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2018, (02) :60-72

[36] Sinh-Cosh Optimization-Based Efficient Clustering for Big Data Applications [J].

Khrissi, Lahbib ;

Es-Sabry, Mohammed ;

El Akkad, Nabil ;

Satori, Hassan ;

Aldosary, Saad ;

El-Shafai, Walid .

IEEE ACCESS, 2024, 12 :193676-193692

[37] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data [J].

Xie, Ting ;

Liu, Ruihua ;

Wei, Zhengyuan .

APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) :1-10

[38] Clustering Optimization Algorithm for Blockchain Systems Based on Big Data Analysis [J].

Lu, Yanjing .

IETE JOURNAL OF RESEARCH, 2023, 69 (10)

[39] A Performance Comparison of Big Data Processing Platform Based on Parallel Clustering Algorithms [J].

Hai, Mo ;

Zhang, Yuejing ;

Li, Haifeng .

6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2018, 139 :127-135

[40] Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor [J].

Awad, Fouad H. ;

Hamad, Murtadha M. .

ELECTRONICS, 2022, 11 (06)

← 1 2 3 4 5 →