Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates

被引:7
|
作者
Cheng, Xiang [1 ]
Su, Sen [1 ]
Gao, Lixin [2 ]
Yin, Jiangtao [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100088, Peoples R China
[2] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Co-Clustering; concurrent updates; sequential updates; cloud computing; distributed framework;
D O I
10.1109/TKDE.2015.2451634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-clustering has emerged to be a powerful data mining tool for two-dimensional co-occurrence and dyadic data. However, co-clustering algorithms often require significant computational resources and have been dismissed as impractical for large data sets. Existing studies have provided strong empirical evidence that expectation-maximization (EM) algorithms (e.g., k-means algorithm) with sequential updates can significantly reduce the computational cost without degrading the resulting solution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering (AMCC) algorithms which are variants of EM algorithms, and also show that AMCC algorithms with sequential updates converge. We then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed environment. Both approaches are proved to maintain the convergence properties of AMCC algorithms. Based on these two approaches, we present a new distributed framework, Co-ClusterD, which supports efficient implementations of AMCC algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a much faster convergence and often obtain better results than their traditional concurrent counterparts.
引用
收藏
页码:3231 / 3244
页数:14
相关论文
共 50 条
  • [1] Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates
    Su, Sen
    Cheng, Xiang
    Gao, Lixin
    Yin, Jiangtao
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 1193 - 1198
  • [2] Joint co-clustering: Co-clustering of genomic and clinical bioimaging data
    Ficarra, Elisa
    De Micheli, Giovanni
    Yoon, Sungroh
    Benini, Luca
    Macii, Enrico
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 938 - 949
  • [3] A New Framework for Co-clustering of Gene Expression Data
    Zhang, Shuzhong
    Wang, Kun
    Chen, Bilian
    Huang, Xiuzhen
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 1 - +
  • [4] Ensemble Block Co-clustering: A Unified Framework for Text Data
    Affeldt, Severine
    Labiod, Lazhar
    Nadif, Mohamed
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 5 - 14
  • [5] A Framework for Simultaneous Co-clustering and Learning from Complex Data
    Deodhar, Meghana
    Ghosh, Joydeep
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 250 - 259
  • [6] SemiNMF-PCA framework for Sparse Data Co-clustering
    Allab, Kais
    Labiod, Lazhar
    Nadif, Mohamed
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 347 - 356
  • [7] Sleeved co-clustering of lagged data
    Shaham, Eran
    Sarne, David
    Ben-Moshe, Boaz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (02) : 251 - 279
  • [8] Co-clustering from Tensor Data
    Boutalbi, Rafika
    Labiod, Lazhar
    Nadif, Mohamed
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 370 - 383
  • [9] Co-clustering for binary and functional data
    Ben Slimen, Yosra
    Jacques, Julien
    Allio, Sylvain
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (09) : 4845 - 4866
  • [10] Sleeved co-clustering of lagged data
    Eran Shaham
    David Sarne
    Boaz Ben-Moshe
    Knowledge and Information Systems, 2012, 31 : 251 - 279