Accelerating Sparse Linear Algebra Using Graphics Processing Units

被引:1
作者
Spagnoli, Kyle E. [1 ]
Humphrey, John R. [1 ]
Price, Daniel K. [1 ]
Kelmelis, Eric J. [1 ]
机构
[1] EM Photon Inc, Newark, DE 19711 USA
来源
MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VI | 2011年 / 8060卷
关键词
graphics processing unit; GPU; accelerated linear algebra; parallel computing; sparse linear algebra; finite element methods; ALGORITHM;
D O I
10.1117/12.884169
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of over 1 TFLOPS of peak computational throughput at a cost similar to a high-end CPU with excellent FLOPS-to-watt ratio. High-level sparse linear algebra operations are computationally intense, often requiring large amounts of parallel operations and would seem a natural fit for the processing power of the GPU. Our work is on a GPU accelerated implementation of sparse linear algebra routines. We present results from both direct and iterative sparse system solvers. The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance. Some constructs from linear algebra map extremely well to the GPU and others map poorly. CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments. Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results. In many cases, this is accomplished by allowing each platform to do the work it performs most naturally. For example, the CPU is responsible for graph theory portion of the direct solvers while the GPU simultaneously performs the low level linear algebra routines.
引用
收藏
页数:9
相关论文
共 9 条
[1]  
[Anonymous], 1994, TEMPLATES SOLUTION L, DOI DOI 10.1137/1.9781611971538
[2]  
[Anonymous], 2008, P 2008 ACM IEEE C SU
[3]  
[Anonymous], ACM T MATH IN PRESS
[4]   Algorithm 832: UMFPACK V4.3 - An unsymmetric-pattern multifrontal method [J].
Davis, TA .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2004, 30 (02) :196-199
[5]   A column approximate minimum degree ordering algorithm [J].
Davis, TA ;
Gilbert, JR ;
Larimore, SI ;
Ng, EG .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2004, 30 (03) :353-376
[6]  
HUMPHREY JR, 2010, SPIE DEF SEC S APR
[7]  
Ilavarasan E., 2007, Journal of Computer Sciences, V3, P94, DOI 10.3844/jcssp.2007.94.103
[8]  
LI X, 2010, DIRECT SOLVERS SPARS
[9]  
PRICE DK, 2010, SPIE DEF SEC S APR