A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver

被引:0
作者
Ding, Nan [1 ]
Liu, Yang [2 ]
Williams, Samuel [1 ]
Li, Xiaoye S. [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Scalable Solvers Grp, Berkeley, CA 94720 USA
来源
PROCEEDINGS OF THE 2021 SIAM CONFERENCE ON APPLIED AND COMPUTATIONAL DISCRETE ALGORITHMS, ACDA21 | 2021年
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class compute citizen, designing an efficient and scalable SpTRSV on multi-GPU HPC systems is imperative. In this paper, we leverage the advantage of GPU-initiated data transfers of NVSHMEM to implement and evaluate a Multi-GPU SpTRSV. We create a novel producer-consumer paradigm to manage the computation and communication in SpTRSV and implement it using two CUDA streams. Our multi-GPU SpTRSV implementation using CUDA streams achieves a 3.7x speedup when using twelve GPUs (two nodes) relative to our implementation on a single GPU, and up to 6.1x compared to cusparse csrsv2() over the range of one to eighteen GPUs. To further explain the observed performance and explore the key features of matrices to estimate the potential performance benefits when using multi-GPU, we extend the critical path model of SpTRSV to GPUs. We demonstrate the ability of our performance model to understand various aspects of performance and performance bottlenecks on multi-GPU and motivate code optimizations.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 43 条
[1]  
Anderson E., 1989, International Journal of High Speed Computing, V1, P73, DOI 10.1142/S0129053389000056
[2]  
[Anonymous], 2010, C UNC ART INT UAI
[3]   Parameter preserving model order reduction for MEMS applications [J].
Baur, U. ;
Benner, P. ;
Greiner, A. ;
Korvink, J. G. ;
Lienemann, J. ;
Moosmann, C. .
MATHEMATICAL AND COMPUTER MODELLING OF DYNAMICAL SYSTEMS, 2011, 17 (04) :297-317
[4]  
Beamer S, 2013, SCI PROGRAMMING-NETH, V21, P137, DOI [10.1155/2013/702694, 10.3233/SPR-130370]
[5]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[6]   A supernodal approach to sparse partial pivoting [J].
Demmel, JW ;
Eisenstat, SC ;
Gilbert, JR ;
Li, XYS ;
Liu, JWH .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1999, 20 (03) :720-755
[7]  
Ding N, 2020, Par Pr for Sci Comp, P93
[8]   An Instruction Roofline Model for GPUs [J].
Ding, Nan ;
Williams, Samuel .
PROCEEDINGS OF 2019 IEEE/ACM PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2019), 2019, :7-18
[9]  
docs.nvidia, NVIDIA NVSHMEM Documentation
[10]   A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems [J].
Dufrechou, Ernesto ;
Ezzatti, Pablo .
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, :920-929