Graph Stream Sketch: Summarizing Graph Streams With High Speed and Accuracy

被引:11
作者
Gou, Xiangyang [1 ]
Zou, Lei [2 ,3 ]
Zhao, Chenxingyu [1 ]
Yang, Tong [1 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
[2] Peking Univ, Beijing Inst Big Data Res, Beijing 100871, Peoples R China
[3] Peking Univ, Natl Engn Lab Big Data Anal Technol & Applicat PKU, Beijing 100871, Peoples R China
关键词
Approximate query; data stream; sketch; graph;
D O I
10.1109/TKDE.2022.3174570
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and more. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has linear space cost OojEjTHORN (E is the edge set of the graph) and high update speed, and supports most kinds of queries over graph streams with controllable errors. Experimental results show that our solution is up to 142 times faster than the adjacency list when processing updates in graph streams, and its memory consumption is as small as 30% of the adjacency list. Though error is introduced as a trade off in our solution, both theoretical analysis and experiment results confirm that such error is small and controllable. The relative error is below 10(-2) in edge weight query, and the precision is above 90% is 1-hop precursor/successor queries.
引用
收藏
页码:5901 / 5914
页数:14
相关论文
共 43 条
[1]   Evolutionary Network Analysis: A Survey [J].
Aggarwal, Charu ;
Subbian, Karthik .
ACM COMPUTING SURVEYS, 2014, 47 (01)
[2]   On the streaming model augmented with a sorting primitive [J].
Aggarwal, G ;
Datar, M ;
Rajagopalan, S ;
Ruhl, M .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :540-549
[3]  
Ahn K.J., 2012, PODS, P5, DOI DOI 10.1145/2213556.2213560
[4]  
Assadi S, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1723
[5]   Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems [J].
Besta, Maciej ;
Fischer, Marc ;
Kalavri, Vasiliki ;
Kapralov, Michael ;
Hoefler, Torsten .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (06) :1860-1876
[6]   Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics [J].
Besta, Maciej ;
Weber, Simon ;
Gianinazzi, Lukas ;
Gerstenberger, Robert ;
Ivanov, Andrey ;
Oltchik, Yishai ;
Hoefler, Torsten .
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[7]   Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams [J].
Bhattacharya, Sayan ;
Henzinger, Monika ;
Nanongkai, Danupon ;
Tsourakakis, Charalampos E. .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :173-182
[8]  
Cheng Raymond., 2012, EuroSys, P85
[9]  
Choudhury Sutanay., 2015, EDBT
[10]   An improved data stream summary: The count-min sketch and its applications [J].
Cormode, G ;
Muthukrishnan, S .
LATIN 2004: THEORETICAL INFORMATICS, 2004, 2976 :29-38