p-PIC: Parallel power iteration clustering for big data

被引：26

作者：

Yan, Weizhong ^{[1
]}

Brahmakshatriya, Umang ^{[1
]}

Xue, Ya ^{[1
]}

Gilder, Mark ^{[2
]}

Wise, Bowden ^{[3
]}

机构：

[1] GE Global Res Ctr, Machine Learning Lab, Niskayuna, NY 12039 USA

[2] GE Global Res Ctr, Comp & Cyber Secur Lab, Niskayuna, NY 12039 USA

[3] GE Global Res Ctr, Knowledge Discovery Lab, Niskayuna, NY 12039 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2013年 / 73卷 / 03期

关键词：

Big data; Clustering; Cloud computing; Data-mining; Distributed computing; Machine learning; Parallel computing; Spectral clustering; ALGORITHM;

D O I：

10.1016/j.jpdc.2012.06.009

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed p-PIC algorithm is highly scalable to both data and compute resources. (c) 2012 Elsevier Inc. All rights reserved.

引用

页码：352 / 359

页数：8

共 50 条

[21] Fast and effective Big Data exploration by clustering
Ianni, Michele
Masciari, Elio
Mazzeo, Giuseppe M.
Mezzanzanica, Mario
Zaniolo, Carlo
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 : 84 - 94
[22] Adaptive Power Iteration Clustering
Liu, Bo
Liu, Yong
Zhang, Huiyan
Xu, Yonghui
Tang, Can
Tang, Lianggui
Qin, Huafeng
Miao, Chunyan
KNOWLEDGE-BASED SYSTEMS, 2021, 225
[23] Parallel Clustering of Big Data of Spatio-temporal Trajectory
Hu, Chunchun
Kang, Xionghua
Luo, Nianxue
Zhao, Qiansheng
2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 769 - 774
[24] Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies
Tsai, Chih-Fong
Lin, Wei-Chao
Ke, Shih-Wen
JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 122 : 83 - 92
[25] Strategies for Big Data Clustering
Kurasova, Olga
Marcinkevicius, Virginijus
Medvedev, Viktor
Rapecka, Aurimas
Stefanovic, Pavel
2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 740 - 747
[26] Big Data and Clustering Algorithms
Ajin, V. W.
Kumar, Lekshmy D.
2016 INTERNATIONAL CONFERENCE ON RESEARCH ADVANCES IN INTEGRATED NAVIGATION SYSTEMS (RAINS), 2016,
[27] Big Data clustering validity
Tlili, Monia
Hamdani, Tarek M.
2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 348 - 352
[28] The application of parallel clustering analysis based on big data mining in physical community discovery
Wu, Fan
Zhou, Rui
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (SUPPL 3) : 1054 - 1062
[29] A Modified Hybrid Fuzzy Clustering Method for Big Data
Khoshkbarchi, Amir
Kamali, Ali
Amjadi, Mehdi
Haeri, Maryam Amir
2016 8TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2016, : 196 - 201
[30] Clustering Application for Streaming Big Data in Smart Grid
Banga, Alisha
Sinha, Amrita
PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 1051 - 1054

← 1 2 3 4 5 →