Reducing Communication in Graph Neural Network Training

被引:38
作者
Tripathy, Alok [1 ]
Yelick, Katherine
Buluc, Aydin
机构
[1] Univ Calif Berkeley, Elect Engn & Comp Sci, Berkeley, CA 94720 USA
来源
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20) | 2020年
基金
美国国家科学基金会;
关键词
Graph neural networks; distributed training; communication-avoiding algorithms; MATRIX MULTIPLICATION; DESIGN;
D O I
10.1109/sc41405.2020.00074
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully -connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1.5D, 2D, and 31) sparse -dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
引用
收藏
页数:14
相关论文
共 35 条
[1]   A three-dimensional approach to parallel matrix multiplication [J].
Agarwal, RC ;
Balle, SM ;
Gustavson, FG ;
Joshi, M ;
Palkar, P .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1995, 39 (05) :575-582
[2]   Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations [J].
Aktulga, Hasan Metin ;
Buluc, Aydin ;
Williams, Samuel ;
Yang, Chao .
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[3]   HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks [J].
Azad, Ariful ;
Pavlopoulos, Georgios A. ;
Ouzounis, Christos A. ;
Kyrpides, Nikos C. ;
Buluc, Aydin .
NUCLEIC ACIDS RESEARCH, 2018, 46 (06) :E33
[4]   EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION [J].
Azad, Ariful ;
Ballard, Grey ;
Buluc, Aydin ;
Demmel, James ;
Grigori, Laura ;
Schwartz, Oded ;
Toledo, Sivan ;
Williams, Samuel .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (06) :C624-C651
[5]   MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA [J].
Ballard, Grey ;
Demmel, James ;
Holtz, Olga ;
Schwartz, Oded .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2011, 32 (03) :866-901
[6]   Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis [J].
Ben-Nun, Tal ;
Hoefler, Torsten .
ACM COMPUTING SURVEYS, 2019, 52 (04)
[7]  
Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876
[8]   The Combinatorial BLAS: design, implementation, and applications [J].
Buluc, Aydin ;
Gilbert, John R. .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (04) :496-509
[9]   Collective communication: theory, practice, and experience [J].
Chan, Ernie ;
Heimlich, Marcel ;
Purkayastha, Avi ;
van de Geijn, Robert .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (13) :1749-1783
[10]  
Deslippe Jack., 2019, GPU TECHN C