Parallel Techniques for Compressing and Querying Massive Social Networks

被引:0
作者
Krishna, Sudhindra Gopal [1 ]
Narasimhan, Aditya [1 ]
Radhakrishnan, Sridhar [1 ]
Sekharan, Chandra N. [2 ]
机构
[1] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
[2] Texas A&M Corpus Christi, Dept Comp Sci, Corpus Christi, TX USA
来源
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW | 2023年
关键词
Massive Social Networks; Compression; Graph Algorithms; Time-Evolving; GRAPHS;
D O I
10.1109/IPDPSW59300.2023.00140
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The growing popularity of social networks and the massive influx of users have made it challenging to store and process the network/graph data quickly before the properties of the graph change due to graph evolution. Storing graphs or networks that represent entities and their relationships (such as individuals and their friends/followers in a social network) becomes more difficult as the number of users increases, resulting in massive graphs that are challenging to store in standard structures like matrices or adjacency lists. Research in this field has focused on reducing the memory footprint of these large graphs and minimizing the extra memory required for processing. However, there is a trade-off between time and space, as rigorous redundancy removal to achieve a small memory footprint consumes time, and querying becomes more time-consuming when traversing compressed structures compared to matrices or adjacency lists. In this paper, we introduce a parallel technique for constructing graphs using compressed sparse rows (CSR), which offers a smaller memory footprint and allows for parallel querying algorithms, such as fetching neighbors or checking edge existence. We extend our work to include parallel time-evolving differential compression of CSR using the prefix sum approach. Additionally, we measure the speed-up gained by using multiprocessors to compress the graph data. To evaluate our techniques, we perform empirical analysis on massive anonymized graphs, including Live-Journal, Pokec, Orkut, and WebNotreDame, which are publicly available. Overall, our results demonstrate that our proposed methods achieve a smaller memory footprint and faster querying compared to traditional storage structures, with additional speedup gained (up to 83% for the biggest graph with 3.07M nodes and 117.18M edges) through the use of multiprocessors.
引用
收藏
页码:838 / 847
页数:10
相关论文
共 28 条
  • [1] Interleaved K2-tree: Indexing and Navigating Ternary Relations
    Alvarez-Garcia, Sandra
    Brisaboa, Nieves R.
    De Bernardo, Guillermo
    Navarro, Gonzalo
    [J]. 2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 342 - 351
  • [2] [Anonymous], 2011, STANFORD LARGE NETWO
  • [3] Cache-oblivious B-trees
    Bender, MA
    Demaine, ED
    Farach-Colton, M
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 399 - 409
  • [4] Blelloch G. E., 1990, CMUCS90190
  • [5] Boldi P., 2004, P 13 INT WORLD WID W, P595
  • [6] Brisaboa NR, 2014, LECT NOTES COMPUT SC, V8799, P77, DOI 10.1007/978-3-319-11918-2_8
  • [7] Compact representation of Web graphs with extended functionality
    Brisaboa, Nieves R.
    Ladra, Susana
    Navarro, Gonzalo
    [J]. INFORMATION SYSTEMS, 2014, 39 : 152 - 174
  • [8] Bui-Xuan B.-M., 2002, RR4589 INRIA
  • [9] Caro D, 2016, KNOWL INF SYST, V49, P553, DOI 10.1007/s10115-015-0908-6
  • [10] Data structures for temporal graphs based on compact sequence representations
    Caro, Diego
    Andrea Rodriguez, M.
    Brisaboa, Nieves R.
    [J]. INFORMATION SYSTEMS, 2015, 51 : 1 - 26