Evaluating Multi-core and Many-core Architectures Through Accelerating an Alternating Direction Implicit CFD Solver

被引：7

作者：

Deng, Liang ^{[1
,2
]}

Fang, Jianbin ^{[3
]}

Wang, Fang ^{[3
]}

Bai, Hanli ^{[1
]}

机构：

[1] China Aerodynam Res & Dev Ctr, Computat Aerodynam Inst, Mianyang, Peoples R China

[2] Natl Univ Def Technol, Collaborat Innovat Ctr High Performance Comp, Changsha, Hunan, Peoples R China

[3] Natl Univ Def Technol, Software Inst, Sch Comp, Changsha, Hunan, Peoples R China

来源：

2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC) | 2016年

关键词：

performance; programmability; optimization techniques; alternating direction implicit; CFD solver; Ivy Bridge; Xeon Phi; GPU; CUDA; OpenACC;

D O I：

10.1109/ISPDC.2016.9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and the CUDA-based versions of the ADI solver are developed. To achieve high performance, we use a series of optimizatin techniques. For the Ivy Bridge CPU and Xeon Phi, we focus on three categories of optimization techniques: thread parallelism for multi-/many-core scaling, data parallelism to exploit the SIMD mechanism and improving on-chip data reuse, to maximize the performance. Also, we provide an in-depth analysis on the performance differences between Ivy Bridge and Xeon Phi. Our numerical experiments show that the proposed CUDA-based ADI solver can achieve a speedup of 9.7 on a Kepler GPU in contrast to a single naive serial version and our optimization techniques can improve the performance of the ADI solver by 2.5x on two Ivy Bridge CPUs and 1.7x on the Intel Xeon Phi coprocessor. We also notice that the OpenACC-based version runs around 29% slower than the CUDA-based one with careful manual optimizations. Besides, we systematically evaluate the programmability of the three platforms. Our insights facilitate the programmers to select a right platform with a suitable programming model according to their target applications.

引用

页码：1 / 10

页数：10

共 26 条

[21] Efficient parallel implementation of large scale 3D structured grid CFD applications on the Tianhe-1A supercomputer
Wang Yong-Xian
Zhang Li-Lun
Liu Wei
Che Yong-Gang
Xu Chuan-Fu
Wang Zheng-Hua
Zhuang Yu
[J]. COMPUTERS & FLUIDS, 2013, 80 : 244 - 250
[22] Parallelizing Alternating Direction Implicit Solver on GPUs
Wei, Zhangping
Jang, Byunghyun
Zhang, Yaoxin
Jia, Yafei
[J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 389 - 398
[23] Modeling dam-break flows in channels with 90 degree bend using an alternating-direction implicit based curvilinear hydrodynamic solver
Wood, Amanda
Wang, Keh-Han
[J]. COMPUTERS & FLUIDS, 2015, 114 : 254 - 264
[24] Xia Yidong., 2014, 52nd Aerospace Sciences Meeting, P1129
[25] Xu R., 2014, LANGUAGES COMPILERS, P67, DOI 10.1007/978-3-319-3-0_5
[26] Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
You, Yang
Fu, Haohuan
Song, Shuaiwen Leon
Dehnavi, Maryam Mehri
Gan, Lin
Huang, Xiaomeng
Yang, Guangwen
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2014, 28 (03) : 301 - 318

← 1 2 3 →