PSGraph: How Tencent trains extremely large-scale graphs with Spark?

被引:12
|
作者
Jiang, Jiawei [1 ]
Xiao, Pin [2 ]
Yu, Lele [2 ]
Li, Xiaosen [2 ]
Cheng, Jiefeng [2 ]
Miao, Xupeng [3 ]
Zhang, Zhipeng [3 ]
Cui, Bin [3 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Tencent Inc, TEG, Data Platform, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS & MOE, Beijing, Peoples R China
关键词
graph algorithm; Spark; parameter server;
D O I
10.1109/ICDE48307.2020.00137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spark has extensively used in many applications of Tencent, due to its easy deployment, pipeline capability, and close integration with the Hadoop ecosystem. As the graph computing engine of Spark, GraphX is also widely deployed to process large-scale graph data in Tencent. However, when the size of the graph data is up to billion-scale, GraphX encounters serious performance degradation. Worse, Graphx cannot support the rising advancement of graph embedding (GE) and graph neural network (GNN) algorithms. To address these challenges, we develop a new graph processing system, called PSGraph, which uses Spark executor and PyTorch to perform calculation, and develops a distributed parameter server to store frequently accessed models. PSGraph can train extremely large-scale graph data in Tencent with the parameter server architecture, and enable the training of GE and GNN algorithms. Moreover, PSGraph still benefits from the advantages of Spark via staying inside the Spark ecosystem, and can directly replace GraphX without modification to the existing application framework. Our experiments show that PSGraph outperforms GraphX significantly.
引用
收藏
页码:1549 / 1557
页数:9
相关论文
共 50 条
  • [1] Large-Scale Graphs Community Detection using Spark GraphFrames
    Apostol, Elena-Simona
    Cojocaru, Adrian-Cosmin
    Truica, Ciprian-Octavian
    2024 23RD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, ISPDC 2024, 2024,
  • [2] How Does Performance Scale with Antenna Number for Extremely Large-Scale MIMO?
    Lu, Haiquan
    Zeng, Yong
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [3] Large-Scale Learning with AdaGrad on Spark
    Hadgu, Asmelash Teka
    Nigam, Aastha
    Diaz-Aviles, Ernesto
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2828 - 2830
  • [4] Enhancing KBQA Performance in Large-Scale Chinese Knowledge Graphs Using Apache Spark
    Su, Yi-Jen
    Wu, Cheng-Wei
    Chen, Yi-Ju
    2024 6TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET, ICCCI 2024, 2024, : 181 - 186
  • [5] Scaling Collaborative Filtering to large-scale Bipartite Rating Graphs using Lenskit and Spark
    Sardianos, Christos
    Varlamis, Iraklis
    Eirinaki, Magdalini
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 70 - 79
  • [6] Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers
    Fujisawa, Katsuki
    Endo, Toshio
    Yasui, Yuichiro
    MATHEMATICAL SOFTWARE, ICMS 2016, 2016, 9725 : 265 - 274
  • [7] Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers
    Fujisawa, Katsuki
    Suzumura, Toyotaro
    Sato, Hitoshi
    Ueno, Koji
    Yasui, Yuichiro
    Iwabuchi, Keita
    Endo, Toshio
    OPTIMIZATION IN THE REAL WORLD: TOWARD SOLVING REAL-WORLD OPTIMIZATION PROBLEMS, 2016, 13 : 1 - 13
  • [8] How Symmetric Are Real-World Graphs? A Large-Scale Study
    Ball, Fabian
    Geyer-Schulz, Andreas
    SYMMETRY-BASEL, 2018, 10 (01):
  • [9] A Distributed System for Large-scale n-gram Language Models at Tencent
    Long, Qiang
    Wang, Wei
    Deng, Jinfu
    Liu, Song
    Huang, Wenhao
    Chen, Fangying
    Liu, Sifan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2206 - 2217
  • [10] Min-Forest: Fast Reachability Indexing Approach for Large-Scale Graphs on Spark Platform
    Yang, Liu
    Liu, Tongyong
    Hu, Zhigang
    Liao, Zhifang
    Long, Jun
    WEB SERVICES - ICWS 2018, 2018, 10966 : 437 - 454