Parallel Techniques for Compressing and Querying Massive Social Networks

被引：0

作者：

Krishna, Sudhindra Gopal ^{[1
]}

Narasimhan, Aditya ^{[1
]}

Radhakrishnan, Sridhar ^{[1
]}

Sekharan, Chandra N. ^{[2
]}

机构：

[1] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA

[2] Texas A&M Corpus Christi, Dept Comp Sci, Corpus Christi, TX USA

来源：

2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW | 2023年

关键词：

Massive Social Networks; Compression; Graph Algorithms; Time-Evolving; GRAPHS;

D O I：

10.1109/IPDPSW59300.2023.00140

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The growing popularity of social networks and the massive influx of users have made it challenging to store and process the network/graph data quickly before the properties of the graph change due to graph evolution. Storing graphs or networks that represent entities and their relationships (such as individuals and their friends/followers in a social network) becomes more difficult as the number of users increases, resulting in massive graphs that are challenging to store in standard structures like matrices or adjacency lists. Research in this field has focused on reducing the memory footprint of these large graphs and minimizing the extra memory required for processing. However, there is a trade-off between time and space, as rigorous redundancy removal to achieve a small memory footprint consumes time, and querying becomes more time-consuming when traversing compressed structures compared to matrices or adjacency lists. In this paper, we introduce a parallel technique for constructing graphs using compressed sparse rows (CSR), which offers a smaller memory footprint and allows for parallel querying algorithms, such as fetching neighbors or checking edge existence. We extend our work to include parallel time-evolving differential compression of CSR using the prefix sum approach. Additionally, we measure the speed-up gained by using multiprocessors to compress the graph data. To evaluate our techniques, we perform empirical analysis on massive anonymized graphs, including Live-Journal, Pokec, Orkut, and WebNotreDame, which are publicly available. Overall, our results demonstrate that our proposed methods achieve a smaller memory footprint and faster querying compared to traditional storage structures, with additional speedup gained (up to 83% for the biggest graph with 3.07M nodes and 117.18M edges) through the use of multiprocessors.

引用

页码：838 / 847

页数：10

共 28 条

[1] Interleaved K2-tree: Indexing and Navigating Ternary Relations
Alvarez-Garcia, Sandra
Brisaboa, Nieves R.
De Bernardo, Guillermo
Navarro, Gonzalo
[J]. 2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 342 - 351
[2] [Anonymous], 2011, STANFORD LARGE NETWO
[3] Cache-oblivious B-trees
Bender, MA
Demaine, ED
Farach-Colton, M
[J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 399 - 409
[4] Blelloch G. E., 1990, CMUCS90190
[5] Boldi P., 2004, P 13 INT WORLD WID W, P595
[6] Brisaboa NR, 2014, LECT NOTES COMPUT SC, V8799, P77, DOI 10.1007/978-3-319-11918-2_8
[7] Compact representation of Web graphs with extended functionality
Brisaboa, Nieves R.
Ladra, Susana
Navarro, Gonzalo
[J]. INFORMATION SYSTEMS, 2014, 39 : 152 - 174
[8] Bui-Xuan B.-M., 2002, RR4589 INRIA
[9] Caro D, 2016, KNOWL INF SYST, V49, P553, DOI 10.1007/s10115-015-0908-6
[10] Data structures for temporal graphs based on compact sequence representations
Caro, Diego
Andrea Rodriguez, M.
Brisaboa, Nieves R.
[J]. INFORMATION SYSTEMS, 2015, 51 : 1 - 26

← 1 2 3 →