p-PIC: Parallel power iteration clustering for big data

被引：26

作者：

Yan, Weizhong ^{[1
]}

Brahmakshatriya, Umang ^{[1
]}

Xue, Ya ^{[1
]}

Gilder, Mark ^{[2
]}

Wise, Bowden ^{[3
]}

机构：

[1] GE Global Res Ctr, Machine Learning Lab, Niskayuna, NY 12039 USA

[2] GE Global Res Ctr, Comp & Cyber Secur Lab, Niskayuna, NY 12039 USA

[3] GE Global Res Ctr, Knowledge Discovery Lab, Niskayuna, NY 12039 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2013年 / 73卷 / 03期

关键词：

Big data; Clustering; Cloud computing; Data-mining; Distributed computing; Machine learning; Parallel computing; Spectral clustering; ALGORITHM;

D O I：

10.1016/j.jpdc.2012.06.009

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed p-PIC algorithm is highly scalable to both data and compute resources. (c) 2012 Elsevier Inc. All rights reserved.

引用

页码：352 / 359

页数：8

共 50 条

[31] Heterogeneous Distributed Big Data Clustering on Sparse Grids
Pfander, David
Daiss, Gregor
Pflueger, Dirk
ALGORITHMS, 2019, 12 (03)
[32] Sinh-Cosh Optimization-Based Efficient Clustering for Big Data Applications
Khrissi, Lahbib
Es-Sabry, Mohammed
El Akkad, Nabil
Satori, Hassan
Aldosary, Saad
El-Shafai, Walid
IEEE ACCESS, 2024, 12 : 193676 - 193692
[33] Parallel Implementation of P Systems for Data Clustering on GPU
Jin, Jie
Liu, Hui
Wang, Fengjuan
Peng, Hong
Wang, Jun
BIO-INSPIRED COMPUTING - THEORIES AND APPLICATIONS, BIC-TA 2015, 2015, 562 : 200 - 211
[34] High Performance Big Data Clustering
Agrawal, Ankit
Patwary, Md. Mostofa Ali
Hendrix, William
Liao, Wei-keng
Choudhary, Alok
CLOUD COMPUTING AND BIG DATA, 2013, 23 : 192 - 211
[35] A Novel Intelligent Clustering Approach for High Dimensional Data in a Big Data Environment
Tao, Qian
Wang, Zhenyu
Gu, Chunqin
Chen, Wenyuan
Lin, Weiqiang
Lin, Haojie
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
[36] Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering
Zhang, Chihao
Yang, Yang
Zhou, Wei
Zhang, Shihua
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3701 - 3713
[37] Iterative subsampling in solution path clustering of noisy big data
Marchetti, Yuliya
Zhou, Qing
STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 415 - 431
[38] Big Data based User Clustering and Influence Power Ranking
Jia, Yuwei
Chao, Kun
Cheng, Xinzhou
Yuan, Mingqiang
Mu, Mingjun
2016 16TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2016, : 371 - 375
[39] How to Use K-means for Big Data Clustering?
Mussabayev, Rustam
Mladenovic, Nenad
Jarboui, Bassem
Mussabayev, Ravil
PATTERN RECOGNITION, 2023, 137
[40] The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data
Gandhi, Bhagyashri S.
Deshpande, Leena A.
2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,

← 1 2 3 4 5 →