Sliding-ITeM: An Adaptive-Size Graph Stream Summarization Structure Based on Sliding Windows

被引:0
作者
Wang, Yacheng [1 ]
Chen, Hanhua [1 ]
Chen, Hao [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Sch Comp Sci & Technol,Cluster & Grid Comp Lab, 1037 Luoyu Rd, Wuhan 430074, Peoples R China
关键词
Graph stream; Summarization; Sliding window; Approximate query;
D O I
10.1007/s41019-024-00279-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph stream is the model used to represent evolving graph data over time, which can be represented as a sequence of edge streams containing temporal information. To effectively manage an ultra large scale graph stream, existing designs usually use summarization structures based on compressed matrices to support approximate storage and querying of graph streams. However, the state-of-the-art structures are based on limited-sized compressed matrix. When dealing with dynamical graph stream data, they either use an extra adjacency list outside the compressed matrix to store left-over edges whose expected buckets in the matrix have been occupied by other previously inserted edges, or allocate new building blocks of compressed matrices to provide more space capacity. Such designs suffer from linear lookup time caused by the adjacency list or long system blocking time caused by data movement during structure scaling. Moreover, in graph stream applications with dynamically growing data sizes, recent data commonly carries greater significance and value. Existing designs fail to store the time information of items of graph streams in a space efficient way and leave recent data management over graph streams an unsolved problem. To address the dynamically expanding graph stream with the ability to accentuate the importance of recent data, in this work, we propose Sliding-ITeM, a novel adaptive-size graph stream summarization structure with a sliding window model. Two factors contribute to the efficiency of Sliding-ITeM. First, Sliding-ITeM proposes a novel fingerprint suffix index tree (FSIT) structure to efficiently manage the items assigned to a same bucket of a compressed matrix in a fine-grained and scalable way. It thus achieves time and space efficiency for graph stream management as well as avoiding costly blocking time for structure extending. Second, Sliding-ITeM divides continuous time into discrete time slices and stores items belong to different time slices in separate ITeM compressed matrices. Sliding-ITeM organizes the compressed matrices into a chain style chronologically and achieves efficient obtaining of value from recent data as well as removal of expired data following a sliding-window model. We conduct comprehensive experiments over large-scale graph stream data collected from real world systems to evaluate the performance of Sliding-ITeM. Experimental results show that it significantly reduces the operation latency by more than 67% in sliding window queries compared to state-of-the-art designs, while greatly reducing the system blocking duration by three orders of magnitude.
引用
收藏
页数:27
相关论文
共 34 条
[1]   Sketch-Based Anomaly Detection in Streaming Graphs [J].
Bhatia, Siddharth ;
Wadhwa, Mohit ;
Kawaguchi, Kenji ;
Shah, Neil ;
Yu, Philip S. ;
Hooi, Bryan .
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, :93-104
[2]   Scube: Efficient Summarization for Skewed Graph Streams [J].
Chen, Ming ;
Zhou, Renxiang ;
Chen, Hanhua ;
Jin, Hai .
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, :100-110
[3]   Horae: A Graph Stream Summarization Structure for Efficient Temporal Range Query [J].
Chen, Ming ;
Zhou, Renxiang ;
Chen, Hanhua ;
Xiao, Jiang ;
Jin, Hai ;
Li, Bo .
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, :2792-2804
[4]   Efficiently Answering Reachability and Path Queries on Temporal Bipartite Graphs [J].
Chen, Xiaoshuang ;
Wang, Kai ;
Lin, Xuemin ;
Zhang, Wenjie ;
Qin, Lu ;
Zhang, Ying .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (10) :1845-1858
[5]  
Ftaimi S, 2021, P 4 INT C NETW INF S, P1
[6]   Sliding Sketches: A Framework using Time Zones for Data Stream Processing in Sliding Windows [J].
Gou, Xiangyang ;
He, Long ;
Zhang, Yinda ;
Wang, Ke ;
Liu, Xilai ;
Yang, Tong ;
Wang, Yi ;
Cui, Bin .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :1015-1025
[7]   Fast and Accurate Graph Stream Summarization [J].
Gou, Xiangyang ;
Zou, Lei ;
Zhao, Chenxingyu ;
Yang, Tong .
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, :1118-1129
[8]   SBG-Sketch: A Self-Balanced Sketch for Labeled-Graph Stream Summarization [J].
Hassan, Mohamed S. ;
Ribeiro, Bruno ;
Aref, Walid G. .
30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
[9]   DMatrix: Toward fast and accurate queries in graph stream [J].
Hou, Changsheng ;
Hou, Bingnan ;
Zhou, Tongqing ;
Cai, Zhiping .
COMPUTER NETWORKS, 2021, 198
[10]  
Huang Chu, 2023, Database Systems for Advanced Applications: 28th International Conference, DASFAA 2023, Proceedings. Lecture Notes in Computer Science (13945), P154, DOI 10.1007/978-3-031-30675-4_11