A framework for clustering massive graph streams

被引：3

作者：

Aggarwal C.C. ^{[1
]}

Zhao Y. ^{[2
]}

Yu P.S. ^{[2
]}

机构：

[1] IBM T. J. Watson Research Center, Hawthorne

[2] Department of Computer Science, University of Illinois at Chicago, Chicago

来源：

Statistical Analysis and Data Mining | 2010年 / 3卷 / 06期

关键词：

Clustering; Data mining; Graphs;

D O I：

10.1002/sam.10090

中图分类号：

学科分类号：

摘要：

In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010.

引用

页码：399 / 416

页数：17

共 50 条

[31] AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams
Chen, Qiuwen
Luley, Ryan
Wu, Qing
Bishop, Morgan
Linderman, Richard W.
Qiu, Qinru
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1622 - 1636
[32] A general framework for real-time analysis of massive multimedia streams
Ilaria Bartolini
Marco Patella
Multimedia Systems, 2018, 24 : 391 - 406
[33] A general framework for real-time analysis of massive multimedia streams
Bartolini, Ilaria
Patella, Marco
MULTIMEDIA SYSTEMS, 2018, 24 (04) : 391 - 406
[34] A clustering-based approach for classifying data streams using graph matching
Du, Yuxin
He, Mingshu
Wang, Xiaojuan
JOURNAL OF BIG DATA, 2025, 12 (01)
[35] MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering
Bifet, Albert
Holmes, Geoff
Pfahringer, Bernhard
Kranen, Philipp
Kremer, Hardy
Jansen, Timm
Seidl, Thomas
PROCEEDINGS OF THE FIRST WORKSHOP ON APPLICATIONS OF PATTERN ANALYSIS, 2010, 11 : 44 - 50
[36] Modeling recurring concepts in data streams: a graph-based framework
Zahra Ahmadi
Stefan Kramer
Knowledge and Information Systems, 2018, 55 : 15 - 44
[37] A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering
Yogita
Toshniwal, Durga
2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 214 - 222
[38] NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams
Chen, Chaoyi
Gao, Dechao
Zhang, Yanfeng
Wang, Qiange
Fu, Zhenbo
Zhang, Xuecang
Zhu, Junhua
Gu, Yu
Yu, Ge
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (03): : 455 - 468
[39] Modeling recurring concepts in data streams: a graph-based framework
Ahmadi, Zahra
Kramer, Stefan
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 55 (01) : 15 - 44
[40] An Integrated Graph and Probability Based Clustering Framework for Sequential Data
Elghazel, Haytham
Yoshida, Tetsuya
Hacid, Mohand-Said
DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 246 - +

← 1 2 3 4 5 →