Finite-difference time-domain method (FDTD);
Graphical processing units (GPUs);
Compute-unified device architecture (CUDA);
Parallel and multi-core programming;
Fortran;
D O I:
10.1016/j.cpc.2013.01.006
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
A detailed description of programming the three-dimensional finite-difference time-domain (FDTD) method to run on graphical processing units (GPUs) using CUDA Fortran is presented. Two FDTD-to-CUDA thread-block mapping designs are investigated and their performances compared. Comparative assessment of trade-offs between GPU's shared memory and L1 cache is also discussed. This presentation is for the benefit of FDTD programmers who work exclusively with Fortran and are reluctant to port their codes to C in order to utilize GPU computing. The derived CUDA Fortran code is compared with an optimized CPU version that runs on a workstation-class CPU to present a realistic GPU to CPU run time comparison and thus help in making better informed investment decisions on FDTD code redesigns and equipment upgrades. All analyses are mirrored with CUDA C simulations to put in perspective the present state of CUDA Fortran development. (C) 2013 Elsevier B.V. All rights reserved.