BGS: Accelerate GNN training on multiple GPUs

被引：1

作者：

Tan, Yujuan ^{[1
]}

Bai, Zhuoxin ^{[1
]}

Liu, Duo ^{[2
]}

Zeng, Zhaoyang ^{[1
]}

Gan, Yan ^{[1
]}

Ren, Ao ^{[1
]}

Chen, Xianzhang ^{[1
]}

Zhong, Kan ^{[2
]}

机构：

[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China

[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2024年 / 153卷

基金：

中国国家自然科学基金;

关键词：

Graph neural networks; GPU; Cache; Graph partition; NVLink;

D O I：

10.1016/j.sysarc.2024.103162

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph -structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.

引用

页数：13

共 50 条

[21] Scaling the training of particle classification on simulated MicroBooNE events to multiple GPUs
Hagen, A.
Church, E.
Strube, J.
Bhattacharya, K.
Amatya, V.
19TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2020, 1525
[22] Fast training algorithm for deep neural network using multiple GPUs
Dai, L. (lrdai@ustc.edu.cn), 1600, Tsinghua University (53):
[23] Training deep neural network on multiple GPUs with a model averaging method
Qiongjie Yao
Xiaofei Liao
Hai Jin
Peer-to-Peer Networking and Applications, 2018, 11 : 1012 - 1021
[24] Using Multiple GPUs to Accelerate MTF Compensation and Georectification of High-Resolution Optical Satellite Images
Wang, Mi
Fang, Liuyang
Li, Deren
Pan, Jun
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2015, 8 (10) : 4952 - 4972
[25] MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification
Jia, Peng
Xuan, Liming
Liu, Lei
Wei, Chaochun
PLOS ONE, 2011, 6 (11):
[26] Using GPUs to Accelerate Data Discovery and Visual Analytics
Mostak, Todd
PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 1310 - 1313
[27] Auto-Divide GNN: Accelerating GNN Training with Subgraph Division
Chen, Hongyu
Ran, Zhejiang
Ge, Keshi
Lai, Zhiquan
Jiang, Jingfei
Li, Dongsheng
EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 367 - 382
[28] Accelerating Distributed GNN Training by Codes
Wang, Yanhong
Guan, Tianchan
Niu, Dimin
Zou, Qiaosha
Zheng, Hongzhong
Shi, C. -J. Richard
Xie, Yuan
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2598 - 2614
[29] Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs
Song, Shihui
Jiang, Peng
PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
[30] Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Miao, Xupeng
Wang, Yujie
Jiang, Youhe
Shi, Chunan
Nie, Xiaonan
Zhang, Hailin
Cui, Bin
PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (03): : 470 - 479

← 1 2 3 4 5 →