Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

被引:8
作者
You, Yang [1 ]
Fu, Haohuan [1 ,4 ]
Song, Shuaiwen Leon [2 ]
Dehnavi, Maryam Mehri [3 ]
Gan, Lin [1 ,4 ]
Huang, Xiaomeng [1 ,4 ]
Yang, Guangwen [1 ,4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Pacific NW Natl Lab, Performance Anal Lab, Richland, WA 99352 USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[4] Tsinghua Univ, Key Lab Earth Syst Modeling, Minist Educ, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Complex stencil; 3D wave forward modeling; Kepler GPU; Intel Xeon Phi; optimization techniques; performance power analysis; WAVE-PROPAGATION; HIGH-ORDER; GPU; PROCESSORS; POWER; CARDS;
D O I
10.1177/1094342014524807
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-core and many-core architectures such as Intel (R) Sandy Bridge CPUs, NVIDIA Fermi C2070 GPUs, NVIDIA Kepler K20X GPUs, and the Intel (R) Xeon Phi co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels. For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best performance. Although our stencil with 114 component variables poses several great challenges for performance optimization, and the low stencil ratio between computation and memory access is too inefficient to fully take advantage of our evaluated architectures, we manage to achieve performance efficiencies ranging from 4.730% to 20.02% of the theoretical peak. We also conduct cross-platform performance and power analysis (focusing on Kepler GPU and MIC) and the results could serve as insights for users selecting the most suitable accelerators for their targeted applications.
引用
收藏
页码:301 / 318
页数:18
相关论文
共 27 条
  • [1] [Anonymous], 2009, P 2 WORKSHOP GEN PUR
  • [2] [Anonymous], 26 ANN INT C MACH LE
  • [3] [Anonymous], TECHNICAL REPORT
  • [4] PERFORMANCE COMPARISON OF FPGA, GPU AND CPU IN IMAGE PROCESSING
    Asano, Shuichi
    Maruyama, Tsutomu
    Yamaguchi, Yoshiki
    [J]. FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 126 - 131
  • [5] Balakrishnan M, 2012, COMM COM INF SC, V306, P3
  • [6] Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture
    Benthin, Carsten
    Wald, Ingo
    Woop, Sven
    Ernst, Manfred
    Mark, William R.
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18 (09) : 1438 - 1448
  • [7] Blanch J, 2007, GEOPHYS J INT, V131, P381
  • [8] A portable programming interface for performance evaluation on modern processors
    Browne, S
    Dongarra, J
    Garner, N
    Ho, G
    Mucci, P
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2000, 14 (03) : 189 - 204
  • [9] Selecting the right hardware for reverse time migration
    Clapp R.G.
    Fu H.
    Lindtjorn O.
    [J]. Leading Edge (Tulsa, OK), 2010, 29 (01) : 48 - 58
  • [10] THE APPLICATION OF HIGH-ORDER DIFFERENCING TO THE SCALAR WAVE-EQUATION
    DABLAIN, MA
    [J]. GEOPHYSICS, 1986, 51 (01) : 54 - 66