Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

被引:7
|
作者
Cheng, Xiang [1 ]
Su, Sen [1 ]
Gao, Lixin [2 ]
Yin, Jiangtao [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China
[2] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Co-Clustering; concurrent updates; sequential updates; cloud computing; distributed framework;
D O I
10.1109/TKDE.2015.2451634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-clustering has emerged to be a powerful data mining tool for two-dimensional co-occurrence and dyadic data. However, co-clustering algorithms often require significant computational resources and have been dismissed as impractical for large data sets. Existing studies have provided strong empirical evidence that expectation-maximization (EM) algorithms (e.g., k-means algorithm) with sequential updates can significantly reduce the computational cost without degrading the resulting solution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering (AMCC) algorithms which are variants of EM algorithms, and also show that AMCC algorithms with sequential updates converge. We then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed environment. Both approaches are proved to maintain the convergence properties of AMCC algorithms. Based on these two approaches, we present a new distributed framework, Co-ClusterD, which supports efficient implementations of AMCC algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a much faster convergence and often obtain better results than their traditional concurrent counterparts.
引用
收藏
页码:3231 / 3244
页数:14
相关论文
共 50 条
  • [31] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [32] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [33] A Semi-supervised Fuzzy Co-clustering Framework and Application to Twitter Data Analysis
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    Takahashi, Norimitsu
    Ishikawa, Yutaka
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [34] Bipartite isoperimetric graph partitioning for data co-clustering
    Rege, Manjeet
    Dong, Ming
    Fotouhi, Farshad
    DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (03) : 276 - 312
  • [35] CFOND: Consensus Factorization for Co-Clustering Networked Data
    Guo, Ting
    Pan, Shirui
    Zhu, Xingquan
    Zhang, Chengqi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) : 706 - 719
  • [36] Co-Adjustment Learning for Co-Clustering
    Ji Zhang
    Hongjun Wang
    Shudong Huang
    Tianrun Li
    Peng Jin
    Ping Deng
    Qigang Zhao
    Cognitive Computation, 2021, 13 : 504 - 517
  • [37] Bipartite isoperimetric graph partitioning for data co-clustering
    Manjeet Rege
    Ming Dong
    Farshad Fotouhi
    Data Mining and Knowledge Discovery, 2008, 16 : 276 - 312
  • [38] Subspace Weighting Co-Clustering of Gene Expression Data
    Chen, Xiaojun
    Huang, Joshua Z.
    Wu, Qingyao
    Yang, Min
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 352 - 364
  • [39] Geosocial Co-Clustering: A Novel Framework for Geosocial Community Detection
    Kim, Jungeun
    Lee, Jae-Gil
    Lee, Byung Suk
    Liu, Jiajun
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (04)
  • [40] CLR: A Collaborative Location Recommendation Framework based on Co-Clustering
    Leung, Kenneth Wai-Ting
    Lee, Dik Lun
    Lee, Wang-Chien
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 305 - 314