MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

被引:9
作者
Waleffe, Roger [1 ]
Mohoney, Jason [1 ]
Rekatsinas, Theodoros [2 ]
Venkataraman, Shivaram [1 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Swiss Fed Inst Technol, Zurich, Switzerland
来源
PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023 | 2023年
基金
美国国家科学基金会;
关键词
GNNs; GNN Training; Multi-hop Sampling;
D O I
10.1145/3552326.3567501
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out-of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the entire storage hierarchy-including disk-for GNN training. MariusGNN introduces a series of data organization and algorithmic contributions that 1) minimize the end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in memory. We evaluate MariusGNN against SoTA systems for learning GNN models and find that single-GPU training in MariusGNN achieves the same level of accuracy up to 8x faster than multi-GPU training in these systems, thus, introducing an order of magnitude monetary cost reduction. MariusGNN is open-sourced at www.marius-project.org.
引用
收藏
页码:144 / 161
页数:18
相关论文
共 54 条
[11]  
Gandhi S, 2021, PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), P551
[12]  
Google, 2018, FREEB DAT DUMPS
[13]  
Hamilton WL, 2017, ADV NEUR IN, V30
[14]  
HaoChen J, 2019, PR MACH LEARN RES, V97
[15]  
Hofmann T, 2015, ADV NEUR IN, V28
[16]  
Hu Weihua, 2021, 35 C NEUR INF PROC S
[17]  
Hu Weihua, 2020, Advances in Neural Information Processing Systems, V33
[18]   Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale [J].
Ilyas, Ihab F. ;
Rekatsinas, Theodoros ;
Pound, Vishnu Konda Jeffrey ;
Qi, Xiaoguang ;
Soliman, Mohamed .
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, :2259-2272
[19]   Accelerating Graph Sampling for Graph Machine Learning using GPUs [J].
Jangda, Abhinav ;
Polisetty, Sandeep ;
Guha, Arjun ;
Serafini, Marco .
PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, :311-326
[20]  
Jia Z., 2020, P MACH LEARN SYST, P187