Streaming Sparse Graphs using Efficient Dynamic Sets

被引:10
作者
Wheatman, Brian [1 ]
Burns, Randal [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年
关键词
D O I
10.1109/BigData52589.2021.9671836
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the SSTGraph framework for the storage and analysis of dynamic graphs. Its performance matches or exceeds state-of-the-art static graph engines and supports streaming updates. SSTGraph builds on top of the tinyset parallel, dynamic set data structure. Tinyset implements set membership in a shallow hierarchy of sorted packed memory arrays to achieve logarithmic time access and updates, and it scans in optimal linear time. Tinyset uses space comparable to that of systems that use data compression while avoiding compression's computation and serialization overhead. SSTGraph outperforms other streaming, dynamic graph engines on a suite of four graph algorithms. Our evaluation includes a comparison with the Aspen streaming graph system. SSTGraph reduces runtime by 40% on average, updates are 2x-5x faster on batch sizes up to 10 million, and graphs are smaller. The partitioned data structure scales well and runs on billion edge graphs in just 15 GB of memory.
引用
收藏
页码:284 / 294
页数:11
相关论文
共 42 条
[1]  
[Anonymous], 2010, P 26 C UNC ART INT
[2]  
[Anonymous], 2019, Amazon web services (aws) - cloud computing services
[3]   HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks [J].
Azad, Ariful ;
Pavlopoulos, Georgios A. ;
Ouzounis, Christos A. ;
Kyrpides, Nikos C. ;
Buluc, Aydin .
NUCLEIC ACIDS RESEARCH, 2018, 46 (06) :E33
[4]  
Backstrom L., 2006, P 12 ACM SIGKDD INT, P44, DOI DOI 10.1145/1150402.1150412
[5]   Cache-oblivious B-trees [J].
Bender, MA ;
Demaine, ED ;
Farach-Colton, M .
SIAM JOURNAL ON COMPUTING, 2005, 35 (02) :341-358
[6]  
Besta M., 2020, The practice of streaming and dynamic graphs: Concepts, models, systems, and parallelism
[7]  
Blumofe R. D., 1996, SPAA '96. 8th Annual ACM Symposium on Parallel Algorithms and Architectures, P297, DOI 10.1145/237502.237574
[8]  
Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876
[9]  
Chakrabarti D, 2004, SIAM PROC S, P442
[10]  
Cormen T. H., 2009, INTRO ALGORITHMS