A framework for clustering massive graph streams

被引:3
|
作者
Aggarwal C.C. [1 ]
Zhao Y. [2 ]
Yu P.S. [2 ]
机构
[1] IBM T. J. Watson Research Center, Hawthorne
[2] Department of Computer Science, University of Illinois at Chicago, Chicago
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 06期
关键词
Clustering; Data mining; Graphs;
D O I
10.1002/sam.10090
中图分类号
学科分类号
摘要
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010.
引用
收藏
页码:399 / 416
页数:17
相关论文
共 50 条
  • [21] A categorical data clustering framework on graph representation
    Bai, Liang
    Liang, Jiye
    PATTERN RECOGNITION, 2022, 128
  • [22] A Divide and Conquer Framework for Distributed Graph Clustering
    Yang, Wenzhuo
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 504 - 513
  • [23] A Graph Based Framework for Clustering and Characterization of SOM
    Jaziri, Rakia
    Benabdeslem, Khalid
    Elghazel, Haytham
    ARTIFICIAL NEURAL NETWORKS (ICANN 2010), PT III, 2010, 6354 : 387 - +
  • [24] CORECLUSTER: A Degeneracy Based Graph Clustering Framework
    Giatsidis, Christos
    Malliaros, Fragkiskos D.
    Thilikos, Dimitrios M.
    Vazirgiannis, Michalis
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 44 - 50
  • [25] Scaling clustering algorithms for massive data sets using data streams
    Nittel, S
    Leung, KT
    Braverman, A
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 830 - 830
  • [26] A Framework for Accelerating Graph Convolutional Networks on Massive Datasets
    Li, Xiang
    Jin, Ruoming
    Ramnath, Rajiv
    Agrawal, Gagan
    COMPUTATIONAL DATA AND SOCIAL NETWORKS, CSONET 2021, 2021, 13116 : 79 - 92
  • [27] Tiered Sampling: An Efficient Method for Counting Sparse Motifs in Massive Graph Streams
    De Stefani, Lorenzo
    Terolli, Erisa
    Upfal, Eli
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (05)
  • [28] GLCC: A General Framework for Graph-Level Clustering
    Ju, Wei
    Gu, Yiyang
    Chen, Binqi
    Sun, Gongbo
    Qin, Yifang
    Liu, Xingyuming
    Luo, Xiao
    Zhang, Ming
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4391 - 4399
  • [29] GBAGC: A General Bayesian Framework for Attributed Graph Clustering
    Xu, Zhiqiang
    Ke, Yiping
    Wang, Yi
    Cheng, Hong
    Cheng, James
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 9 (01)
  • [30] Graph Convolutional Subspace Clustering: A Robust Subspace Clustering Framework for Hyperspectral Image
    Cai, Yaoming
    Zhang, Zijia
    Cai, Zhihua
    Liu, Xiaobo
    Jiang, Xinwei
    Yan, Qin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (05): : 4191 - 4202