Evaluating Multi-core and Many-core Architectures Through Accelerating an Alternating Direction Implicit CFD Solver

被引：7

作者：

Deng, Liang ^{[1
,2
]}

Fang, Jianbin ^{[3
]}

Wang, Fang ^{[3
]}

Bai, Hanli ^{[1
]}

机构：

[1] China Aerodynam Res & Dev Ctr, Computat Aerodynam Inst, Mianyang, Peoples R China

[2] Natl Univ Def Technol, Collaborat Innovat Ctr High Performance Comp, Changsha, Hunan, Peoples R China

[3] Natl Univ Def Technol, Software Inst, Sch Comp, Changsha, Hunan, Peoples R China

来源：

2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC) | 2016年

关键词：

performance; programmability; optimization techniques; alternating direction implicit; CFD solver; Ivy Bridge; Xeon Phi; GPU; CUDA; OpenACC;

D O I：

10.1109/ISPDC.2016.9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and the CUDA-based versions of the ADI solver are developed. To achieve high performance, we use a series of optimizatin techniques. For the Ivy Bridge CPU and Xeon Phi, we focus on three categories of optimization techniques: thread parallelism for multi-/many-core scaling, data parallelism to exploit the SIMD mechanism and improving on-chip data reuse, to maximize the performance. Also, we provide an in-depth analysis on the performance differences between Ivy Bridge and Xeon Phi. Our numerical experiments show that the proposed CUDA-based ADI solver can achieve a speedup of 9.7 on a Kepler GPU in contrast to a single naive serial version and our optimization techniques can improve the performance of the ADI solver by 2.5x on two Ivy Bridge CPUs and 1.7x on the Intel Xeon Phi coprocessor. We also notice that the OpenACC-based version runs around 29% slower than the CUDA-based one with careful manual optimizations. Besides, we systematically evaluate the programmability of the three platforms. Our insights facilitate the programmers to select a right platform with a suitable programming model according to their target applications.

引用

页码：1 / 10

页数：10

共 26 条

[1] [Anonymous], 2013, GPU OCC CALC
[2] [Anonymous], 2006, Riemann Solvers and Numerical Methods for Fluid Dynamics: A Practical Introduction
[3] [Anonymous], 2015, OPENACC PROGR BEST P
[4] [Anonymous], 2010, CUDA C Best Practices Guide
[5] An efficient parallel ADI algorithm for turbomachinery flows
Giangiacomo, P
Michelassi, V
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS, 2003, 17 (01) : 15 - 26
[6] Harris Mark, 2007, NVIDIA DEV TECHNOLOG, V2, P4
[7] Jeffers J., 2013, Intel Xeon Phi coprocessor high-performance programming
[8] On the Programmability and Performance of Heterogeneous Platforms
Krommydas, Konstantinos
Scogland, Thomas R. W.
Feng, Wu-chun
[J]. 2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 224 - 231
[9] GPU-based high-performance computing for integrated surface-sub-surface flow modeling
Le, Phong V. V.
Kumar, Praveen
Valocchi, Albert J.
Dang, Hoang-Vu
[J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2015, 73 : 1 - 13
[10] Panickar P., 2013, P 51 AIAA AER SCI M, V613, P2013

← 1 2 3 →