A framework for clustering massive graph streams

被引:3
|
作者
Aggarwal C.C. [1 ]
Zhao Y. [2 ]
Yu P.S. [2 ]
机构
[1] IBM T. J. Watson Research Center, Hawthorne
[2] Department of Computer Science, University of Illinois at Chicago, Chicago
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 06期
关键词
Clustering; Data mining; Graphs;
D O I
10.1002/sam.10090
中图分类号
学科分类号
摘要
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010.
引用
收藏
页码:399 / 416
页数:17
相关论文
共 50 条
  • [31] AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams
    Chen, Qiuwen
    Luley, Ryan
    Wu, Qing
    Bishop, Morgan
    Linderman, Richard W.
    Qiu, Qinru
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1622 - 1636
  • [32] A general framework for real-time analysis of massive multimedia streams
    Ilaria Bartolini
    Marco Patella
    Multimedia Systems, 2018, 24 : 391 - 406
  • [33] A general framework for real-time analysis of massive multimedia streams
    Bartolini, Ilaria
    Patella, Marco
    MULTIMEDIA SYSTEMS, 2018, 24 (04) : 391 - 406
  • [34] A clustering-based approach for classifying data streams using graph matching
    Du, Yuxin
    He, Mingshu
    Wang, Xiaojuan
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [35] MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering
    Bifet, Albert
    Holmes, Geoff
    Pfahringer, Bernhard
    Kranen, Philipp
    Kremer, Hardy
    Jansen, Timm
    Seidl, Thomas
    PROCEEDINGS OF THE FIRST WORKSHOP ON APPLICATIONS OF PATTERN ANALYSIS, 2010, 11 : 44 - 50
  • [36] Modeling recurring concepts in data streams: a graph-based framework
    Zahra Ahmadi
    Stefan Kramer
    Knowledge and Information Systems, 2018, 55 : 15 - 44
  • [37] A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering
    Yogita
    Toshniwal, Durga
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 214 - 222
  • [38] NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams
    Chen, Chaoyi
    Gao, Dechao
    Zhang, Yanfeng
    Wang, Qiange
    Fu, Zhenbo
    Zhang, Xuecang
    Zhu, Junhua
    Gu, Yu
    Yu, Ge
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (03): : 455 - 468
  • [39] Modeling recurring concepts in data streams: a graph-based framework
    Ahmadi, Zahra
    Kramer, Stefan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 55 (01) : 15 - 44
  • [40] An Integrated Graph and Probability Based Clustering Framework for Sequential Data
    Elghazel, Haytham
    Yoshida, Tetsuya
    Hacid, Mohand-Said
    DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 246 - +