Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

被引:7
|
作者
Cheng, Xiang [1 ]
Su, Sen [1 ]
Gao, Lixin [2 ]
Yin, Jiangtao [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China
[2] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Co-Clustering; concurrent updates; sequential updates; cloud computing; distributed framework;
D O I
10.1109/TKDE.2015.2451634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-clustering has emerged to be a powerful data mining tool for two-dimensional co-occurrence and dyadic data. However, co-clustering algorithms often require significant computational resources and have been dismissed as impractical for large data sets. Existing studies have provided strong empirical evidence that expectation-maximization (EM) algorithms (e.g., k-means algorithm) with sequential updates can significantly reduce the computational cost without degrading the resulting solution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering (AMCC) algorithms which are variants of EM algorithms, and also show that AMCC algorithms with sequential updates converge. We then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed environment. Both approaches are proved to maintain the convergence properties of AMCC algorithms. Based on these two approaches, we present a new distributed framework, Co-ClusterD, which supports efficient implementations of AMCC algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a much faster convergence and often obtain better results than their traditional concurrent counterparts.
引用
收藏
页码:3231 / 3244
页数:14
相关论文
共 50 条
  • [21] Bayesian Co-clustering
    Shan, Hanhuai
    Banerjee, Arindam
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 530 - 539
  • [22] Adaptive Spectral Co-clustering for Multiview Data
    Son, Jeong-Woo
    Jeon, Junekey
    Lee, Sang-Yun
    Kim, Sun-Joong
    2016 18TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - INFORMATION AND COMMUNICATIONS FOR SAFE AND SECURE LIFE, 2016, : 447 - 450
  • [23] A scalable collaborative filtering framework based on co-clustering
    George, T
    Merugu, S
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 625 - 628
  • [24] A Survey of Co-Clustering
    Wang, Hongjun
    Song, Yi
    Chen, Wei
    Luo, Zhipeng
    Li, Chongshou
    Li, Tianrui
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [25] Co-Clustering on Manifolds
    Gu, Quanquan
    Zhou, Jie
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 359 - 367
  • [26] Directional co-clustering
    Aghiles Salah
    Mohamed Nadif
    Advances in Data Analysis and Classification, 2019, 13 : 591 - 620
  • [27] Directional co-clustering
    Salah, Aghiles
    Nadif, Mohamed
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) : 591 - 620
  • [28] Bayesian co-clustering
    Domeniconi, Carlotta
    Laskey, Kathryn
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (05) : 347 - 356
  • [29] A Deterministic Clustering Framework in MMMs-Induced Fuzzy Co-clustering
    Oshio, Shunnya
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2015, 2015, 9376 : 204 - 213
  • [30] Privacy Preserving Fuzzy Co-clustering with Distributed Cooccurrence Matrices
    Tanaka, Daiji
    Oda, Toshiya
    Honda, Katsuhiro
    Notsu, Akira
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 700 - 705