ByteGNN: Efficient Graph Neural Network Training at Large Scale

被引:29
|
作者
Zheng, Chenguang [1 ,2 ]
Chen, Hongzhi [2 ]
Cheng, Yuxuan [2 ]
Song, Zhezheng [1 ]
Wu, Yifan [2 ,3 ]
Li, Changji [2 ]
Cheng, James [1 ]
Yang, Hao [2 ]
Zhang, Shuai [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] ByteDacne Inc, Beijing, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 06期
关键词
D O I
10.14778/3514061.3514069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have shown excellent performance in a wide range of applications such as recommendation, risk control, and drug discovery. With the increase in the volume of graph data, distributed GNN systems become essential to support efficient GNN training. However, existing distributed GNN training systems suffer from various performance issues including high network communication cost, low CPU utilization, and poor end-to-end performance. In this paper, we propose ByteGNN, which addresses the limitations in existing distributed GNN systems with three key designs: (1) an abstraction of mini-batch graph sampling to support high parallelism, (2) a two-level scheduling strategy to improve resource utilization and to reduce the end-to-end GNN training time, and (3) a graph partitioning algorithm tailored for GNN workloads. Our experiments show that ByteGNN outperforms the state-of-the-art distributed GNN systems with up to 3.5-23.8 times faster end-to-end execution, 2-6 times higher CPU utilization, and around half of the network communication cost.
引用
收藏
页码:1228 / 1242
页数:15
相关论文
共 50 条
  • [21] Parallel Large-Scale Neural Network Training For Online Advertising
    Qi, Quanchang
    Lu, Guangming
    Zhang, Jun
    Yang, Lichun
    Liu, Haishan
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 343 - 350
  • [22] Ripple Walk Training: A Subgraph-based Training Framework for Large and Deep Graph Neural Network
    Bai, Jiyang
    Ren, Yuxiang
    Zhang, Jiawei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [23] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
    Ge, Keshi
    Ran, Zhejiang
    Lai, Zhiquan
    Zhang, Lizhi
    Li, Dongsheng
    Concurrency and Computation: Practice and Experience, 2022, 34 (15)
  • [24] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
    Ge, Keshi
    Ran, Zhejiang
    Lai, Zhiquan
    Zhang, Lizhi
    Li, Dongsheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15):
  • [25] Graph Neural Network Training with Data Tiering
    Min, Seung Won
    Wu, Kun
    Hidayetoglu, Mert
    Xiong, Jinjun
    Song, Xiang
    Hwu, Wen-mei
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3555 - 3565
  • [26] Distributed Graph Neural Network Training: A Survey
    Shao, Yingxia
    Li, Hongzheng
    Gu, Xizhi
    Yin, Hongbo
    Li, Yawen
    Miao, Xupeng
    Zhang, Wentao
    Cui, Bin
    Chen, Lei
    ACM COMPUTING SURVEYS, 2024, 56 (08)
  • [27] Reducing Communication in Graph Neural Network Training
    Tripathy, Alok
    Yelick, Katherine
    Buluc, Aydin
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [28] Crafting Efficient Neural Graph of Large Entropy
    Dong, Minjing
    Chen, Hanting
    Wang, Yunhe
    Xu, Chang
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2244 - 2250
  • [29] An efficient approach for large scale graph partitioning
    Renzo Zamprogno
    André R. S. Amaral
    Journal of Combinatorial Optimization, 2007, 13
  • [30] An efficient approach for large scale graph partitioning
    Loureiro, Renzo Z.
    Amaral, Andre R. S.
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2007, 13 (04) : 289 - 320