Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators

被引:10
作者
Meswani, Mitesh R. [1 ]
Carrington, Laura [1 ]
Unat, Didem
Snavely, Allan [1 ]
Baden, Scott
Poole, Stephen
机构
[1] UCSD, SDSC, La Jolla, CA USA
来源
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW) | 2012年
关键词
accelerators; GPU; FPGA; performance prediction; performance modeling; benchmarking; HPC;
D O I
10.1109/IPDPSW.2012.226
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Computers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use accelerators. However, in addition to procurement cost, significant programming and porting effort is required to realize the potential benefit of such accelerators. Hence, before building such a system it is prudent to answer the question 'what is the projected performance benefit from accelerators for the workloads of interest?'. We address this question by way of a performance-modeling framework that predicts realizable application performance on accelerators rapidly and accurately without going to the considerable effort of porting and tuning. The modeling framework first automatically identifies commonly found compute patterns in scientific applications which we term idioms, which may benefit by accelerator technology. Next the framework models the predicted speedup of those idioms if they were to be ported to and run on hardware accelerators. As a proof of concept we characterize two kinds of accelerators 1) the FPGA accelerators on a Convey HC-1 system and 2) an NVIDIA FERMI GPU accelerator. We model performance of the idioms gather/scatter and stream and our predictions show that where these occur in two full-scale HPC applications, Milc and HYCOM, gather/scatter speeds up by as much as 15X, and stream by as much as 14X, whereas the overall compute time of Milc improves by 3.4% and HYCOM by 20%.
引用
收藏
页码:1828 / 1837
页数:10
相关论文
共 33 条
  • [1] Adve V., 2000, INT J HIGH PERFORMAN, V14
  • [2] Alam SR, 2007, LECT NOTES COMPUT SC, V4782, P683
  • [3] ALAM SR, 2006, 5 INT WORKSH PERF MO
  • [4] Almasi G., 2001, P 15 INT C SUP SORR
  • [5] [Anonymous], 2005, P 2005 ACM IEEE C SU
  • [6] [Anonymous], INT S PERF AN SYST S
  • [7] [Anonymous], 1993, 4 ACM SIGPLAN S PRIN
  • [8] [Anonymous], 2004, P JOINT INT C MEAS M
  • [9] Armstrong B., 1998, INT C PAR PROC
  • [10] Bailey D., 2005, EUROPAR