Accelerating sparse Cholesky factorization on GPUs

被引:34
作者
Rennich, Steven C. [1 ]
Stosic, Darko [1 ]
Davis, Timothy A. [2 ]
机构
[1] NVIDIA, Santa Clara, CA 95050 USA
[2] Texas A&M Univ, CSE, College Stn, TX USA
基金
美国国家科学基金会;
关键词
Sparse; Cholesky; Factorization; GPU; Parallel;
D O I
10.1016/j.parco.2016.06.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse factorization is a fundamental tool in scientific computing. As the major component of a sparse direct solver, it represents the dominant computational cost for many analyses. For factorizations which involve sufficient dense math, the substantial computational capability provided by GPUs (Graphics Processing Units) can help alleviate this cost. However, for many other cases, the prevalence of small/irregular dense math and the relatively slow communication between the host and device over the PCIe bus, make it challenging to significantly accelerate sparse factorization using the GPU. In this paper we describe a left-looking supernodal Cholesky factorization algorithm which permits improved utilization of the GPU when factoring sparse matrices. The central idea is to stream subtrees of the elimination tree through the GPU and perform the factorization of each subtree entirely on the GPU. This avoids the majority of the PCIe communication without the need for a complex task scheduler. Importantly, within these sub trees, many independent, small, dense operations are batched to minimize kernel launch overhead and many of these batched kernels are executed concurrently to maximize device utilization. Performance results for commonly studied matrices are presented along with suggested actions for further optimization. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:140 / 150
页数:11
相关论文
共 20 条
[1]  
Anderson E, 1990, P 1990 ACM IEEE C SU, P2, DOI 10.1109/superc.1990.129995
[2]  
[Anonymous], 2014, CUDA C PROGRAMMING G
[3]  
[Anonymous], ISITR677
[4]   An updated set of Basic Linear Algebra Subprograms (BLAS) [J].
Blackford, LS ;
Demmel, J ;
Dongarra, J ;
Duff, I ;
Hammarling, S ;
Henry, G ;
Heroux, M ;
Kaufman, L ;
Lumsdaine, A ;
Petitet, A ;
Pozo, R ;
Remington, K ;
Whaley, RC .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2002, 28 (02) :135-151
[5]   Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate [J].
Chen, Yanqing ;
Davis, Timothy A. ;
Hager, William W. ;
Rajamanickam, Sivasankaran .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (03)
[6]  
Davis TA, 2006, FUND ALGORITHMS, V2, P1, DOI 10.1137/1.9780898718881
[7]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[8]  
Fatica M., 2009, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, P46, DOI DOI 10.1145/1513895.1513901
[9]  
George T., 2011, IEEE INT PAR DISTR P
[10]   A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations [J].
Gould, Nicholas I. M. ;
Scott, Jennifer A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2007, 33 (02)