On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm

被引:5
作者
D'Azevedo, E. F. [1 ]
Fata, S. Nintcheu [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Collocation approximation; Boundary element method; Triangulated boundary; Graphics processor; FACTORIZATION;
D O I
10.1016/j.enganabound.2012.02.014
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from http://intetec.org, has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (CPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the CPU. Out-of-core techniques are used to solve problems larger than the available CPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56 Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the CPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the CPU code. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1246 / 1255
页数:10
相关论文
共 20 条
[1]  
[Anonymous], 2011, CUDA by Example: An Introduction to General-Purpose GPU Programming
[2]  
Bonnet M., 1995, Boundary integral equation methods for solids and fluids
[3]  
Choi Jaeyoung., 1996, SCI PROGRAMMING-NETH, V5, P173
[4]  
D'Azevedo E, 2000, CONCURRENCY-PRACT EX, V12, P1481, DOI 10.1002/1096-9128(20001225)12:15<1481::AID-CPE540>3.0.CO
[5]  
2-V
[6]   Key concepts for parallel out-of-core LU factorization [J].
Dongarra, JJ ;
Hammarling, S ;
Walker, DW .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1998, 35 (07) :13-31
[7]   Explicit expressions for 3D boundary integrals in potential theory [J].
Fata, S. Nintcheu .
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2009, 78 (01) :32-47
[8]  
Fernando R., 2003, CG TUTORIAL
[9]   Fast multipole methods on graphics processors [J].
Gumerov, Nail A. ;
Duraiswami, Ramani .
JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (18) :8290-8313
[10]  
Hsiao GC, 2008, APPL MATH SCI, V164, P1