A framework for clustering massive graph streams

被引:3
|
作者
Aggarwal C.C. [1 ]
Zhao Y. [2 ]
Yu P.S. [2 ]
机构
[1] IBM T. J. Watson Research Center, Hawthorne
[2] Department of Computer Science, University of Illinois at Chicago, Chicago
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 06期
关键词
Clustering; Data mining; Graphs;
D O I
10.1002/sam.10090
中图分类号
学科分类号
摘要
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010.
引用
收藏
页码:399 / 416
页数:17
相关论文
共 50 条
  • [1] A Framework for Clustering Massive Text and Categorical Data Streams
    Aggarwal, Charu C.
    Yu, Philip S.
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 479 - 483
  • [2] A Framework for Clustering Massive-Domain Data Streams
    Aggarwal, Charu C.
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 102 - 113
  • [3] On Sampling from Massive Graph Streams
    Ahmed, Nesreen K.
    Duffield, Nick
    Willke, Theodore L.
    Rossi, Ryan A.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1430 - 1441
  • [4] A Parameterized Framework for Clustering Streams
    Bhatnagar, Vasudha
    Kaur, Sharanjit
    Mignet, Laurent
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2009, 5 (01) : 36 - 56
  • [5] On clustering massive text and categorical data streams
    Charu C. Aggarwal
    Philip S. Yu
    Knowledge and Information Systems, 2010, 24 : 171 - 196
  • [6] Online Clustering of Massive Text Data Streams
    Maha Ben-Fares
    Parisa Rastin
    Nistor Grozavu
    Pierre Holat
    SN Computer Science, 6 (5)
  • [7] On clustering massive text and categorical data streams
    Aggarwal, Charu C.
    Yu, Philip S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (02) : 171 - 196
  • [8] An Adaptive Framework for Clustering Data Streams
    Chandrika
    Kumar, K. R. Ananda
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT I, 2011, 190 : 704 - +
  • [9] A probabilistic framework for graph clustering
    Luo, B
    Robles-Kelly, A
    Torsello, A
    Wilson, RC
    Hancock, ER
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 912 - 919
  • [10] A general framework for mining massive data streams
    Domingos, P
    Hulten, G
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (04) : 945 - 949