p-PIC: Parallel power iteration clustering for big data

被引:26
作者
Yan, Weizhong [1 ]
Brahmakshatriya, Umang [1 ]
Xue, Ya [1 ]
Gilder, Mark [2 ]
Wise, Bowden [3 ]
机构
[1] GE Global Res Ctr, Machine Learning Lab, Niskayuna, NY 12039 USA
[2] GE Global Res Ctr, Comp & Cyber Secur Lab, Niskayuna, NY 12039 USA
[3] GE Global Res Ctr, Knowledge Discovery Lab, Niskayuna, NY 12039 USA
关键词
Big data; Clustering; Cloud computing; Data-mining; Distributed computing; Machine learning; Parallel computing; Spectral clustering; ALGORITHM;
D O I
10.1016/j.jpdc.2012.06.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed p-PIC algorithm is highly scalable to both data and compute resources. (c) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:352 / 359
页数:8
相关论文
共 50 条
  • [41] DENCLUE-IM: A New Approach for Big Data Clustering
    Rehioui, Hajar
    Idrissi, Abdellah
    Abourezq, Manar
    Zegrari, Faouzia
    7TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2016) / THE 6TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2016) / AFFILIATED WORKSHOPS, 2016, 83 : 560 - 567
  • [42] New Approach for Clustering of Big Data: DisK-Means
    Saini, Anu
    Minocha, Jagrit
    Ubriani, Jaypriya
    Sharma, Dhruv
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 122 - 126
  • [43] The Power of Big Data and Data Analytics for AMI Data: A Case Study
    Sidney Guerrero-Prado, Jenniffer
    Alfonso-Morales, Wilfredo
    Caicedo-Bravo, Eduardo
    Zayas-Perez, Benjamin
    Espinosa-Reza, Alfredo
    SENSORS, 2020, 20 (11) : 1 - 27
  • [44] A Parallel Military-Dog-Based Algorithm for Clustering Big Data in Cognitive Industrial Internet of Things
    Tripathi, Ashish Kumar
    Sharma, Kapil
    Bala, Manju
    Kumar, Akshi
    Menon, Varun G.
    Bashir, Ali Kashif
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (03) : 2134 - 2142
  • [45] A Parallel DistributedWeka Framework for Big Data Mining using Spark
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 9 - 16
  • [46] Online clustering of parallel data streams
    Beringer, Juergen
    Huellermeier, Eyke
    DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) : 180 - 204
  • [47] Parallel grid-based density peak clustering of big trajectory data
    Niu, Xinzheng
    Zheng, Yunhong
    Fournier-Viger, Philippe
    Wang, Bing
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17042 - 17057
  • [48] Big Data Analytics Using Cloud Computing Based Frameworks for Power Management Systems: Status, Constraints, and Future Recommendations
    AL-Jumaili, Ahmed Hadi Ali
    Muniyandi, Ravie Chandren
    Hasan, Mohammad Kamrul
    Paw, Johnny Koh Siaw
    Singh, Mandeep Jit
    SENSORS, 2023, 23 (06)
  • [49] Semi-supervised Power Iteration Clustering
    Yang, Yuqi
    Bie, Rongfang
    Wu, Hao
    Xu, Shuaijing
    Li, Liangchi
    2018 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2019, 147 : 588 - 595
  • [50] Deflation-based power iteration clustering
    Anh Pham The
    Nguyen Duc Thang
    La The Vinh
    Lee, Young-Koo
    Lee, Sungyoung
    APPLIED INTELLIGENCE, 2013, 39 (02) : 367 - 385