PSGraph: How Tencent trains extremely large-scale graphs with Spark?

被引:12
|
作者
Jiang, Jiawei [1 ]
Xiao, Pin [2 ]
Yu, Lele [2 ]
Li, Xiaosen [2 ]
Cheng, Jiefeng [2 ]
Miao, Xupeng [3 ]
Zhang, Zhipeng [3 ]
Cui, Bin [3 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Tencent Inc, TEG, Data Platform, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS & MOE, Beijing, Peoples R China
关键词
graph algorithm; Spark; parameter server;
D O I
10.1109/ICDE48307.2020.00137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spark has extensively used in many applications of Tencent, due to its easy deployment, pipeline capability, and close integration with the Hadoop ecosystem. As the graph computing engine of Spark, GraphX is also widely deployed to process large-scale graph data in Tencent. However, when the size of the graph data is up to billion-scale, GraphX encounters serious performance degradation. Worse, Graphx cannot support the rising advancement of graph embedding (GE) and graph neural network (GNN) algorithms. To address these challenges, we develop a new graph processing system, called PSGraph, which uses Spark executor and PyTorch to perform calculation, and develops a distributed parameter server to store frequently accessed models. PSGraph can train extremely large-scale graph data in Tencent with the parameter server architecture, and enable the training of GE and GNN algorithms. Moreover, PSGraph still benefits from the advantages of Spark via staying inside the Spark ecosystem, and can directly replace GraphX without modification to the existing application framework. Our experiments show that PSGraph outperforms GraphX significantly.
引用
收藏
页码:1549 / 1557
页数:9
相关论文
共 50 条
  • [41] A Large-Scale Filter Method for Feature Selection Based on Spark
    Marone, Reine Marie
    Camara, Fode
    Ndiaye, Samba
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2017, : 16 - 20
  • [42] GeoMatch: Efficient Large-Scale Map Matching on Apache Spark
    Zeidan, Ayman
    Lagerspetz, Eemil
    Zhao, Kai
    Nurmi, Petteri
    Tarkoma, Sasu
    Vo, Huy T.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 384 - 391
  • [43] Filter Large-scale Engine Data using Apache Spark
    Pirozzi, Donato
    Scarano, Vittorio
    Begg, Steven
    De Sercey, Guillaume
    Fish, Andrew
    Harvey, Andrew
    2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2016, : 1300 - 1305
  • [44] Conformal Prediction in Spark: Large-Scale Machine Learning with Confidence
    Capuccini, Marco
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 61 - 67
  • [45] Scalable Motif Counting for Large-scale Temporal Graphs
    Gao, Zhongqiang
    Cheng, Chuanqi
    Yu, Yanwei
    Cao, Lei
    Huang, Chao
    Dong, Junyu
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2656 - 2668
  • [46] LargeEA: Aligning Entities for Large-scale Knowledge Graphs
    Ge, Congcong
    Liu, Xiaoze
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (02): : 237 - 245
  • [47] ALLIE: Active Learning on Large-scale Imbalanced Graphs
    Cui, Limeng
    Tang, Xianfeng
    Katariya, Sumeet
    Rao, Nikhil
    Agrawal, Pallav
    Subbian, Karthik
    Lee, Dongwon
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 690 - 698
  • [48] The Use of Weighted Graphs for Large-Scale Genome Analysis
    Zhou, Fang
    Toivonen, Hannu
    King, Ross D.
    PLOS ONE, 2014, 9 (03):
  • [49] Particle Swarm Optimization for Large-Scale Clustering on Apache Spark
    Sherar, Matthew
    Zulkernine, Farhana
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 801 - 808
  • [50] A Theoretical and Experimental Comparison of Large-Scale Join Algorithms in Spark
    Phan A.-C.
    Phan T.-C.
    Trieu T.-N.
    Tran T.-T.-Q.
    SN Computer Science, 2021, 2 (5)