How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

被引:10
作者
Cruz, Felipe A. [2 ]
Layton, Simon K. [1 ]
Barba, L. A. [1 ]
机构
[1] Boston Univ, Dept Mech Engn, Boston, MA 02215 USA
[2] Univ Bristol, Dept Math, Bristol BS8 1TH, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
Fast summation methods; Fast multipole method; Fast Gauss transform; Heterogeneous computing; MULTIPOLE METHOD; SIMULATIONS;
D O I
10.1016/j.cpc.2011.05.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on GPUS. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the GPU. The end result has been GPU kernels that run at over 500 Gop/s on one NVIDIA TESLA C1060 card, thereby reaching close to practical peak. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:2084 / 2098
页数:15
相关论文
共 21 条
[1]  
Asanovic K., 2006, The landscape of parallel computing research: A view from berkeley
[2]   Advances in viscous vortex methods - meshless spatial adaption based on radial basis function interpolation [J].
Barba, LA ;
Leonard, A ;
Allen, CB .
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2005, 47 (05) :387-421
[3]  
Beatson Rick, 1997, Wavelets, multilevel methods and elliptic PDEs, V1, P1
[4]   High performance direct gravitational N-body simulations on graphics processing units II:: An implementation in CUDA [J].
Belleman, Robert G. ;
Bedorf, Jeroen ;
Portegies Zwart, Simon .
NEW ASTRONOMY, 2008, 13 (02) :103-112
[5]   Application of the fast Gauss transform to option pricing [J].
Broadie, M ;
Yamamoto, Y .
MANAGEMENT SCIENCE, 2003, 49 (08) :1071-1088
[6]  
Cormen TH., 2009, Introduction to Algorithms, V3
[7]   PetFMM-A dynamically load-balancing parallel fast multipole library [J].
Cruz, Felipe A. ;
Knepley, Matthew G. ;
Barba, L. A. .
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2011, 85 (04) :403-428
[8]   Characterization of the accuracy of the fast multipole method in particle simulations [J].
Cruz, Felipe A. ;
Barba, L. A. .
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2009, 79 (13) :1577-1604
[9]  
Fenley MO, 1996, J COMPUT CHEM, V17, P976, DOI 10.1002/(SICI)1096-987X(199606)17:8<976::AID-JCC7>3.0.CO
[10]  
2-O