A framework for clustering massive graph streams

被引:3
|
作者
Aggarwal C.C. [1 ]
Zhao Y. [2 ]
Yu P.S. [2 ]
机构
[1] IBM T. J. Watson Research Center, Hawthorne
[2] Department of Computer Science, University of Illinois at Chicago, Chicago
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 06期
关键词
Clustering; Data mining; Graphs;
D O I
10.1002/sam.10090
中图分类号
学科分类号
摘要
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010.
引用
收藏
页码:399 / 416
页数:17
相关论文
共 50 条
  • [41] Clustering Massive-Categories and Complex Documents via Graph Convolutional Network
    Zhao, Qingchao
    Yang, Jing
    Wang, Zhengkui
    Chu, Yan
    Shan, Wen
    Tuhin, Isfaque Al Kaderi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 27 - 39
  • [42] Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds
    Assadi, Sepehr
    Chatziafratis, Vaggos
    Lacki, Jakub
    Mirrokni, Vahab
    Wang, Chen
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [43] An Efficient Clustering Framework for Massive Sensor Networking in Industrial Internet of Things
    Pokhrel, Shiva Raj
    Verma, Sandeep
    Garg, Sahil
    Sharma, Ajay K.
    Choi, Jinho
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (07) : 4917 - 4924
  • [44] A distributed streaming framework for edge-cloud triangle counting in graph streams
    Yang, Xu
    Song, Chao
    Gu, Jiqing
    Li, Ke
    Li, Hongwei
    KNOWLEDGE-BASED SYSTEMS, 2023, 278
  • [45] A Unifying Framework to Identify Dense Subgraphs on Streams: Graph Nuclei to Hypergraph Cores
    Gabert, Kasimir
    Pinar, Ali
    Catalyurek, Umit, V
    WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 689 - 697
  • [46] Clustering data streams
    Guha, S
    Mishra, N
    Motwani, R
    O'Callaghan, L
    41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 359 - 366
  • [47] A New Algorithm Framework for the Influence Maximization Problem Using Graph Clustering
    Agra, Agostinho
    Samuco, Jose Maria
    INFORMATION, 2024, 15 (02)
  • [48] Unsupervised graph-clustering learning framework for financial news summarization
    Wang, Jun
    Tan, Jinghua
    Jin, Hanlei
    Qi, Shuo
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 719 - 726
  • [49] CoHoMo: A cluster-attribute correlation aware graph clustering framework
    Yang, Yaming
    Liu, Hongmin
    Guan, Ziyu
    He, Xiaofei
    Liu, Gaoliang
    NEUROCOMPUTING, 2020, 412 : 327 - 338
  • [50] DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization
    Bhowmick, Aritra
    Kosan, Mert
    Huang, Zexi
    Singh, Ambuj
    Medya, Sourav
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11069 - 11077