p-PIC: Parallel power iteration clustering for big data

被引：26

作者：

Yan, Weizhong ^{[1
]}

Brahmakshatriya, Umang ^{[1
]}

Xue, Ya ^{[1
]}

Gilder, Mark ^{[2
]}

Wise, Bowden ^{[3
]}

机构：

[1] GE Global Res Ctr, Machine Learning Lab, Niskayuna, NY 12039 USA

[2] GE Global Res Ctr, Comp & Cyber Secur Lab, Niskayuna, NY 12039 USA

[3] GE Global Res Ctr, Knowledge Discovery Lab, Niskayuna, NY 12039 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2013年 / 73卷 / 03期

关键词：

Big data; Clustering; Cloud computing; Data-mining; Distributed computing; Machine learning; Parallel computing; Spectral clustering; ALGORITHM;

D O I：

10.1016/j.jpdc.2012.06.009

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed p-PIC algorithm is highly scalable to both data and compute resources. (c) 2012 Elsevier Inc. All rights reserved.

引用

页码：352 / 359

页数：8

共 50 条

[1] A survey on parallel clustering algorithms for Big Data
Dafir, Zineb
Lamari, Yasmine
Slaoui, Said Chah
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) : 2411 - 2443
[2] A survey on parallel clustering algorithms for Big Data
Zineb Dafir
Yasmine Lamari
Said Chah Slaoui
Artificial Intelligence Review, 2021, 54 : 2411 - 2443
[3] A GPU Based Parallel Clustering Method for Electric Power Big Data
Ji, Cong
Xiong, Zheng
Fang, Chao
Lv, Hui
Zhang, Kaizhen
2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 29 - 33
[4] Parallel and distributed clustering framework for big spatial data mining
Bendechache, Malika
Tari, A-Kamel
Kechadi, M-Tahar
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
[5] Using Parallel Hierarchical Clustering to Address Spatial Big Data Challenges
Woodley, Alan
Tang, Ling-Xiang
Geva, Shlomo
Nayak, Richi
Chappell, Timothy
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2692 - 2698
[6] An Efficient Parallel Algorithm for Clustering Big Data based on the Spark Framework
Dafir, Zineb
Slaoui, Said
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 890 - 896
[7] Parallel K-prototypes for Clustering Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT II, 2015, 9330 : 628 - 637
[8] Parallel batch k-means for Big data clustering
Alguliyev, Rasim M.
Aliguliyev, Ramiz M.
Sukhostat, Lyudmila, V
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
[9] Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means
Mussabayev, Rustam
Mussabayev, Ravil
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 224 - 236
[10] Big Data Clustering: A Review
Shirkhorshidi, Ali Seyed
Aghabozorgi, Saeed
Teh, Ying Wah
Herawan, Tutut
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 707 - 720

← 1 2 3 4 5 →