FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis

被引:17
作者
Nagasu, Kohei [1 ]
Sano, Kentaro [1 ]
Kono, Fumiya [2 ]
Nakasato, Naohito [2 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Aoba Ku, 6-6-01 Aramaki Aza, Sendai, Miyagi 9808579, Japan
[2] Univ Aizu, Sch Comp Sci & Engn, Ikki Machi Tsuruga, Aizu Wakamatsu, Fukushima 9658580, Japan
关键词
Tsunami simulation; Stream computing; Custom hardware; FPGA; GPU; Roofline model;
D O I
10.1016/j.jpdc.2016.12.015
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
MOST (Method Of Splitting Tsunami) is widely used to solve shallow water equations (SWEs) for simulation of tsunami. This paper presents high-performance and power-efficient computation of MOST for practical tsunami simulation with FPGA. The custom hardware for simulation is based on a stream computing architecture for deeply pipelining to increase performance with a limited bandwidth. We design a stream processing element (SPE) of computing kernels combined with stencil buffers. We also introduce an SPE array architecture with spatial and temporal parallelism to further exploit available hardware resources by implementing multiple SPEs with parallel internal pipelines. Our prototype implementation with Arria 10 FPGA demonstrates that the FPGA-based design performs numerically stable tsunami simulation with real ocean-depth data in single precision by introducing non-dimensionalization. We explore the design space of SPE arrays, and find that the design of six cascaded SPEs with a single pipeline achieves the sustained performance of 383 GFlops and the performance per power of 8.41 GFlops/W with a stream bandwidth of only 7.2 GB/s. These numbers are 8.6 and 17.2 times higher than those of NVidia Tesla K20c GPU, and 1.7 and 7.1 times higher than those of AMD Radeon R9 280X GPU, respectively, for the same tsunami simulation in single precision. Moreover, we proposed a roofline model for stream computing with the SPE array in order to investigate factors of performance degradation and possible performance improvement for given FPGAs. With the model, we estimate that an upcoming Stratix 10 GX2800 FPGA can achieve the sustained performance of 8.7.TFlops at most with our SPE array architecture for tsunami simulation. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 169
页数:17
相关论文
共 29 条
[1]  
[Anonymous], 2013, FINANCIAL RES
[2]  
[Anonymous], P IEEE S LOW POW HIG
[3]  
[Anonymous], ACM SIGARCH COMPUT A
[4]  
[Anonymous], 2004, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page, DOI DOI 10.1109/SC.2004.26
[5]  
[Anonymous], P 2 INT WORKSH FPGAS
[6]  
[Anonymous], P INT C PAR COMP FLU
[7]  
[Anonymous], INT C HIGH PERF COMP
[8]  
[Anonymous], P INT C WIR TECHN HU
[9]  
[Anonymous], IEEE T PARALLEL DIST
[10]  
[Anonymous], P 2015 INT C FIELD P