How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

被引:10
作者
Cruz, Felipe A. [2 ]
Layton, Simon K. [1 ]
Barba, L. A. [1 ]
机构
[1] Boston Univ, Dept Mech Engn, Boston, MA 02215 USA
[2] Univ Bristol, Dept Math, Bristol BS8 1TH, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
Fast summation methods; Fast multipole method; Fast Gauss transform; Heterogeneous computing; MULTIPOLE METHOD; SIMULATIONS;
D O I
10.1016/j.cpc.2011.05.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on GPUS. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the GPU. The end result has been GPU kernels that run at over 500 Gop/s on one NVIDIA TESLA C1060 card, thereby reaching close to practical peak. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:2084 / 2098
页数:15
相关论文
共 21 条
[11]  
Ghuloum A., 2008, UNWELCOME ADVICE
[12]   THE FAST GAUSS TRANSFORM [J].
GREENGARD, L ;
STRAIN, J .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1991, 12 (01) :79-94
[13]   A FAST ALGORITHM FOR PARTICLE SIMULATIONS [J].
GREENGARD, L ;
ROKHLIN, V .
JOURNAL OF COMPUTATIONAL PHYSICS, 1987, 73 (02) :325-348
[14]  
Gumerov N. A., 2004, ELSEVIER SERIES ELEC
[15]   Fast multipole methods on graphics processors [J].
Gumerov, Nail A. ;
Duraiswami, Ramani .
JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (18) :8290-8313
[16]   ON THE ROKHLIN-GREENGARD METHOD WITH VORTEX BLOBS FOR PROBLEMS POSED IN ALL SPACE OR PERIODIC IN ONE DIRECTION [J].
HAMILTON, JT ;
MAJDA, G .
JOURNAL OF COMPUTATIONAL PHYSICS, 1995, 121 (01) :29-50
[17]  
Han SJ, 2006, LECT NOTES COMPUT SC, V3889, P82
[18]  
*NVIDIA CORP, 2009, CUDA PROGR GUID VERS
[19]  
NVIDIA Corporation, 2010, CUDA PROGR GUID VERS
[20]  
Nyland L., 2007, GPU GEMS, P677