PSGraph: How Tencent trains extremely large-scale graphs with Spark?

被引:12
|
作者
Jiang, Jiawei [1 ]
Xiao, Pin [2 ]
Yu, Lele [2 ]
Li, Xiaosen [2 ]
Cheng, Jiefeng [2 ]
Miao, Xupeng [3 ]
Zhang, Zhipeng [3 ]
Cui, Bin [3 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Tencent Inc, TEG, Data Platform, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS & MOE, Beijing, Peoples R China
关键词
graph algorithm; Spark; parameter server;
D O I
10.1109/ICDE48307.2020.00137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spark has extensively used in many applications of Tencent, due to its easy deployment, pipeline capability, and close integration with the Hadoop ecosystem. As the graph computing engine of Spark, GraphX is also widely deployed to process large-scale graph data in Tencent. However, when the size of the graph data is up to billion-scale, GraphX encounters serious performance degradation. Worse, Graphx cannot support the rising advancement of graph embedding (GE) and graph neural network (GNN) algorithms. To address these challenges, we develop a new graph processing system, called PSGraph, which uses Spark executor and PyTorch to perform calculation, and develops a distributed parameter server to store frequently accessed models. PSGraph can train extremely large-scale graph data in Tencent with the parameter server architecture, and enable the training of GE and GNN algorithms. Moreover, PSGraph still benefits from the advantages of Spark via staying inside the Spark ecosystem, and can directly replace GraphX without modification to the existing application framework. Our experiments show that PSGraph outperforms GraphX significantly.
引用
收藏
页码:1549 / 1557
页数:9
相关论文
共 50 条
  • [21] Appraising SPARK on Large-Scale Social Media Analysis
    Belcastro, Loris
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 483 - 495
  • [22] Readable representations for large-scale bipartite graphs
    Sato, Shuji
    Misue, Kazuo
    Tanaka, Jiro
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 831 - 838
  • [23] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [24] Understanding Coarsening for Embedding Large-Scale Graphs
    Akyildiz, Taha Atahan
    Aljundi, Amro Alabsi
    Kaya, Kamer
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
  • [25] Generating Large-Scale Heterogeneous Graphs for Benchmarking
    Gupta, Amarnath
    SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 113 - 128
  • [26] Parallelism and Partitioning in Large-Scale GAs using Spark
    Alterkawi, Laila
    Migliavacca, Matteo
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 736 - 744
  • [27] Efficient mining algorithms for large-scale graphs
    Kishimoto, Yasunari
    Shiokawa, Hiroaki
    Fujiwara, Yasuhiro
    Onizuka, Makoto
    NTT Technical Review, 2013, 11 (12):
  • [28] Parallel generation of large-scale random graphs
    Vullikanti, Anil
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 278 - 278
  • [29] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [30] Large-scale quantum networks based on graphs
    Epping, Michael
    Kampermann, Hermann
    Bruss, Dagmar
    NEW JOURNAL OF PHYSICS, 2016, 18