BGS: Accelerate GNN training on multiple GPUs

被引：1

作者：

Tan, Yujuan ^{[1
]}

Bai, Zhuoxin ^{[1
]}

Liu, Duo ^{[2
]}

Zeng, Zhaoyang ^{[1
]}

Gan, Yan ^{[1
]}

Ren, Ao ^{[1
]}

Chen, Xianzhang ^{[1
]}

Zhong, Kan ^{[2
]}

机构：

[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China

[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2024年 / 153卷

基金：

中国国家自然科学基金;

关键词：

Graph neural networks; GPU; Cache; Graph partition; NVLink;

D O I：

10.1016/j.sysarc.2024.103162

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph -structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.

引用

页数：13

共 50 条

[1] Characterizing and Understanding Distributed GNN Training on GPUs
Lin, Haiyang
Yan, Mingyu
Yang, Xiaocheng
Zou, Mo
Li, Wenming
Ye, Xiaochun
Fan, Dongrui
IEEE COMPUTER ARCHITECTURE LETTERS, 2022, 21 (01) : 21 - 24
[2] CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs
Sun, Qingxiao
Liu, Yi
Yang, Hailong
Zhang, Ruizhe
Dun, Ming
Li, Mingzhen
Liu, Xiaoyan
Xiao, Wencong
Li, Yong
Luan, Zhongzhi
Qian, Depei
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[3] Exploring Attention Sparsity to Accelerate Transformer Training on GPUs
Yoon, Bokyeong
Lee, Ah-Hyun
Kim, Jinsung
Moon, Gordon Euhyun
IEEE ACCESS, 2024, 12 : 131373 - 131384
[4] GNNLab: A Factored System for Sample-based GNN Training over GPUs
Yang, Jianbang
Tang, Dahai
Song, Xiaoniu
Wang, Lei
Yin, Qiang
Chen, Rong
Yu, Wenyuan
Zhou, Jingren
PROCEEDINGS OF THE SEVENTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '22), 2022, : 417 - 434
[5] SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs
He, Yuqi
Lai, Zhiquan
Ran, Zhejiang
Zhang, Lizhi
Li, Dongsheng
2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 106 - 113
[6] Accelerate Graph Neural Network Training by Reusing Batch Data on GPUs
Ran, Zhejiang
Lai, Zhiquan
Zhang, Lizhi
Li, Dongsheng
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[7] AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs
Zhou, Yangjie
Song, Yaoxu
Leng, Jingwen
Liu, Zihan
Cui, Weihao
Zhang, Zhendong
Guo, Cong
Chen, Quan
Li, Li
Guo, Minyi
PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 52 - 62
[8] TurboGNN: Improving the End-to-End Performance for Sampling-Based GNN Training on GPUs
Wu, Wenchao
Shi, Xuanhua
He, Ligang
Jin, Hai
IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (09) : 2571 - 2584
[9] TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
Wang, Yuke
Feng, Boyuan
Wang, Zheng
Huang, Guyue
Ding, Yufei
PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 149 - 164
[10] Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs
Dong, Shi
Sun, Yifan
Agostini, Nicolas Bohm
Karimi, Elmira
Lowell, Daniel
Zhou, Jing
Cano, Jose
Abellan, Jose L.
Kaeli, David
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (10) : 2448 - 2463

← 1 2 3 4 5 →