Evaluating Multi-core and Many-core Architectures Through Accelerating an Alternating Direction Implicit CFD Solver

被引:7
作者
Deng, Liang [1 ,2 ]
Fang, Jianbin [3 ]
Wang, Fang [3 ]
Bai, Hanli [1 ]
机构
[1] China Aerodynam Res & Dev Ctr, Computat Aerodynam Inst, Mianyang, Peoples R China
[2] Natl Univ Def Technol, Collaborat Innovat Ctr High Performance Comp, Changsha, Hunan, Peoples R China
[3] Natl Univ Def Technol, Software Inst, Sch Comp, Changsha, Hunan, Peoples R China
来源
2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC) | 2016年
关键词
performance; programmability; optimization techniques; alternating direction implicit; CFD solver; Ivy Bridge; Xeon Phi; GPU; CUDA; OpenACC;
D O I
10.1109/ISPDC.2016.9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and the CUDA-based versions of the ADI solver are developed. To achieve high performance, we use a series of optimizatin techniques. For the Ivy Bridge CPU and Xeon Phi, we focus on three categories of optimization techniques: thread parallelism for multi-/many-core scaling, data parallelism to exploit the SIMD mechanism and improving on-chip data reuse, to maximize the performance. Also, we provide an in-depth analysis on the performance differences between Ivy Bridge and Xeon Phi. Our numerical experiments show that the proposed CUDA-based ADI solver can achieve a speedup of 9.7 on a Kepler GPU in contrast to a single naive serial version and our optimization techniques can improve the performance of the ADI solver by 2.5x on two Ivy Bridge CPUs and 1.7x on the Intel Xeon Phi coprocessor. We also notice that the OpenACC-based version runs around 29% slower than the CUDA-based one with careful manual optimizations. Besides, we systematically evaluate the programmability of the three platforms. Our insights facilitate the programmers to select a right platform with a suitable programming model according to their target applications.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 26 条
  • [21] Efficient parallel implementation of large scale 3D structured grid CFD applications on the Tianhe-1A supercomputer
    Wang Yong-Xian
    Zhang Li-Lun
    Liu Wei
    Che Yong-Gang
    Xu Chuan-Fu
    Wang Zheng-Hua
    Zhuang Yu
    [J]. COMPUTERS & FLUIDS, 2013, 80 : 244 - 250
  • [22] Parallelizing Alternating Direction Implicit Solver on GPUs
    Wei, Zhangping
    Jang, Byunghyun
    Zhang, Yaoxin
    Jia, Yafei
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 389 - 398
  • [23] Modeling dam-break flows in channels with 90 degree bend using an alternating-direction implicit based curvilinear hydrodynamic solver
    Wood, Amanda
    Wang, Keh-Han
    [J]. COMPUTERS & FLUIDS, 2015, 114 : 254 - 264
  • [24] Xia Yidong., 2014, 52nd Aerospace Sciences Meeting, P1129
  • [25] Xu R., 2014, LANGUAGES COMPILERS, P67, DOI 10.1007/978-3-319-3-0_5
  • [26] Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
    You, Yang
    Fu, Haohuan
    Song, Shuaiwen Leon
    Dehnavi, Maryam Mehri
    Gan, Lin
    Huang, Xiaomeng
    Yang, Guangwen
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2014, 28 (03) : 301 - 318