POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters

被引:0
作者
Li, Shunde [1 ,2 ]
Gu, Junyu [1 ,2 ]
Wang, Jue [1 ,2 ]
Yao, Tiechui [1 ,2 ,5 ]
Liang, Zhiqiang [1 ,2 ]
Shi, Yumeng [1 ,2 ]
Li, Shigang [3 ]
Xi, Weiting [4 ]
Li, Shushen [4 ]
Zhou, Chunbao [1 ,2 ]
Wang, Yangang [1 ,2 ]
Chi, Xuebin [1 ,2 ]
机构
[1] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing, Peoples R China
[4] North China Elect Power Univ, Beijing, Peoples R China
[5] State Grid Smart Grid Res Inst Co LTD, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Graph neural network; Load balancing; Data; transfer hiding; Distributed training;
D O I
10.1145/3627535.3638488
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Full-batch graph neural network (GNN) training is essential for interdisciplinary applications. Large-scale graph data is usually divided into subgraphs and distributed across multiple compute units to train GNN. The state-of-the-art load balancing method based on direct graph partition is too rough to effectively achieve true load balancing on GPU clusters. We propose ParGNN, which employs a profiler-guided load balance workflow in conjunction with graph repartition to alleviate load imbalance and minimize communication traffic. Experiments have verified that ParGNN has the capability to scale to larger clusters.
引用
收藏
页码:469 / 471
页数:3
相关论文
共 9 条
[1]   DGCL: An Efficient Communication Library for Distributed GNN Training [J].
Cai, Zhenkun ;
Yan, Xiao ;
Wu, Yidi ;
Ma, Kaihao ;
Cheng, James ;
Yu, Fan .
PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, :130-144
[2]   AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices [J].
Du, Zhen ;
Li, Jiajia ;
Wang, Yinshan ;
Li, Xueqi ;
Tan, Guangming ;
Sun, Ninghui .
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[3]  
Fey M., 2021, arXiv
[4]  
KarypisG MaCand, 2021, arXiv
[5]  
Ma LX, 2019, PROCEEDINGS OF THE 2019 USENIX ANNUAL TECHNICAL CONFERENCE, P443
[6]  
Mostafa H., 2021, arXiv
[7]   The Graph Neural Network Model [J].
Scarselli, Franco ;
Gori, Marco ;
Tsoi, Ah Chung ;
Hagenbuchner, Markus ;
Monfardini, Gabriele .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (01) :61-80
[8]  
Wan C., 2022, arXiv
[9]  
Yesil Serif, 2023, PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, P329, DOI 10.1145/3572848.3577506