Scube: Efficient Summarization for Skewed Graph Streams

被引:6
作者
Chen, Ming [1 ]
Zhou, Renxiang [1 ]
Chen, Hanhua [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Serv Comp Technol & Syst Lab,Sch Comp Sci & Techn, Wuhan, Peoples R China
来源
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022) | 2022年
关键词
Graph stream; summarization; skewness;
D O I
10.1109/ICDCS54860.2022.00019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams. We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements ()I' space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.
引用
收藏
页码:100 / 110
页数:11
相关论文
共 31 条
[1]  
[Anonymous], 2021, SX STACKOVERFLOW
[2]  
[Anonymous], 2021, WIKI TALK
[3]  
[Anonymous], 2021, TENCENT HLTH CODE
[4]  
[Anonymous], 2021, DBPEDIA
[5]  
[Anonymous], 2021, WECHAT STAT
[6]  
Caida, 2021, CAIDA
[7]   Minimizing Inter-Server Communications by Exploiting Self-Similarity in Online Social Networks [J].
Chen, Hanhua ;
Jin, Hai ;
Wu, Shaoliang .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (04) :1116-1130
[8]  
Chen M., 2022, P ICDE
[9]  
Durand M, 2003, LECT NOTES COMPUT SC, V2832, P605
[10]  
Feng G., 2021, P SIGMOD