MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

被引：9

作者：

Waleffe, Roger ^{[1
]}

Mohoney, Jason ^{[1
]}

Rekatsinas, Theodoros ^{[2
]}

Venkataraman, Shivaram ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Madison, WI 53706 USA

[2] Swiss Fed Inst Technol, Zurich, Switzerland

来源：

PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

GNNs; GNN Training; Multi-hop Sampling;

D O I：

10.1145/3552326.3567501

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out-of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the entire storage hierarchy-including disk-for GNN training. MariusGNN introduces a series of data organization and algorithmic contributions that 1) minimize the end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in memory. We evaluate MariusGNN against SoTA systems for learning GNN models and find that single-GPU training in MariusGNN achieves the same level of accuracy up to 8x faster than multi-GPU training in these systems, thus, introducing an order of magnitude monetary cost reduction. MariusGNN is open-sourced at www.marius-project.org.

引用

页码：144 / 161

页数：18

共 54 条

[11]

Gandhi S, 2021, PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), P551

[12]

Google, 2018, FREEB DAT DUMPS

[13]

Hamilton WL, 2017, ADV NEUR IN, V30

[14]

HaoChen J, 2019, PR MACH LEARN RES, V97

[15]

Hofmann T, 2015, ADV NEUR IN, V28

[16]

Hu Weihua, 2021, 35 C NEUR INF PROC S

[17]

Hu Weihua, 2020, Advances in Neural Information Processing Systems, V33

[18] Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale [J].

Ilyas, Ihab F. ;

Rekatsinas, Theodoros ;

Pound, Vishnu Konda Jeffrey ;

Qi, Xiaoguang ;

Soliman, Mohamed .

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, :2259-2272

[19] Accelerating Graph Sampling for Graph Machine Learning using GPUs [J].

Jangda, Abhinav ;

Polisetty, Sandeep ;

Guha, Arjun ;

Serafini, Marco .

PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, :311-326

[20]

Jia Z., 2020, P MACH LEARN SYST, P187

← 1 2 3 4 5 6 →