ByteGNN: Efficient Graph Neural Network Training at Large Scale

被引：29

作者：

Zheng, Chenguang ^{[1
,2
]}

Chen, Hongzhi ^{[2
]}

Cheng, Yuxuan ^{[2
]}

Song, Zhezheng ^{[1
]}

Wu, Yifan ^{[2
,3
]}

Li, Changji ^{[2
]}

Cheng, James ^{[1
]}

Yang, Hao ^{[2
]}

Zhang, Shuai ^{[2
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] ByteDacne Inc, Beijing, Peoples R China

[3] Peking Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 06期

关键词：

D O I：

10.14778/3514061.3514069

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph neural networks (GNNs) have shown excellent performance in a wide range of applications such as recommendation, risk control, and drug discovery. With the increase in the volume of graph data, distributed GNN systems become essential to support efficient GNN training. However, existing distributed GNN training systems suffer from various performance issues including high network communication cost, low CPU utilization, and poor end-to-end performance. In this paper, we propose ByteGNN, which addresses the limitations in existing distributed GNN systems with three key designs: (1) an abstraction of mini-batch graph sampling to support high parallelism, (2) a two-level scheduling strategy to improve resource utilization and to reduce the end-to-end GNN training time, and (3) a graph partitioning algorithm tailored for GNN workloads. Our experiments show that ByteGNN outperforms the state-of-the-art distributed GNN systems with up to 3.5-23.8 times faster end-to-end execution, 2-6 times higher CPU utilization, and around half of the network communication cost.

引用

页码：1228 / 1242

页数：15

共 50 条

[21] Parallel Large-Scale Neural Network Training For Online Advertising
Qi, Quanchang
Lu, Guangming
Zhang, Jun
Yang, Lichun
Liu, Haishan
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 343 - 350
[22] Ripple Walk Training: A Subgraph-based Training Framework for Large and Deep Graph Neural Network
Bai, Jiyang
Ren, Yuxiang
Zhang, Jiawei
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[23] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
Ge, Keshi
Ran, Zhejiang
Lai, Zhiquan
Zhang, Lizhi
Li, Dongsheng
Concurrency and Computation: Practice and Experience, 2022, 34 (15)
[24] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
Ge, Keshi
Ran, Zhejiang
Lai, Zhiquan
Zhang, Lizhi
Li, Dongsheng
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15):
[25] Graph Neural Network Training with Data Tiering
Min, Seung Won
Wu, Kun
Hidayetoglu, Mert
Xiong, Jinjun
Song, Xiang
Hwu, Wen-mei
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3555 - 3565
[26] Distributed Graph Neural Network Training: A Survey
Shao, Yingxia
Li, Hongzheng
Gu, Xizhi
Yin, Hongbo
Li, Yawen
Miao, Xupeng
Zhang, Wentao
Cui, Bin
Chen, Lei
ACM COMPUTING SURVEYS, 2024, 56 (08)
[27] Reducing Communication in Graph Neural Network Training
Tripathy, Alok
Yelick, Katherine
Buluc, Aydin
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[28] Crafting Efficient Neural Graph of Large Entropy
Dong, Minjing
Chen, Hanting
Wang, Yunhe
Xu, Chang
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2244 - 2250
[29] An efficient approach for large scale graph partitioning
Renzo Zamprogno
André R. S. Amaral
Journal of Combinatorial Optimization, 2007, 13
[30] An efficient approach for large scale graph partitioning
Loureiro, Renzo Z.
Amaral, Andre R. S.
JOURNAL OF COMBINATORIAL OPTIMIZATION, 2007, 13 (04) : 289 - 320

← 1 2 3 4 5 →