Analysis of OpenMP 4.5 Offloading in Implementations: Correctness and Overhead

被引:14
作者
Diaz, Jose Monsalve [1 ]
Friedline, Kyle [1 ]
Pophale, Swaroop [2 ]
Hernandez, Oscar [2 ]
Bernholdt, David E. [2 ]
Chandrasekaran, Sunita [1 ]
机构
[1] Univ Delaware, 18 Amstel Ave, Newark, DE 19716 USA
[2] Oak Ridge Natl Lab, 1 Bethel Valley Rd, Oak Ridge, TN 37831 USA
关键词
OpenMP; 4.5; Offloading; Overhead measurement; SUITE;
D O I
10.1016/j.parco.2019.102546
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The OpenMP language features have been evolving to meet the rapid development in hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 target offload features in compilers such as Clang, XL and GCC that are an integral part of the software harness on supercomputers and clusters. We use Summit (Top supercomputer in the world as of November 2018) as one of our experimental setup. Such an effort is particularly critical on such supercomputers as that is being widely used by application developers to run their scientific codes at scale. Our tests not only evaluate the OpenMP implementations but also expose ambiguities within the OpenMP 4.5 specification. We also assess the overhead of the different OpenMP runtimes in relationship to the different directives and clauses. This helps in assessing the interaction of different OpenMP directives independent of other application artifacts. We are aware that the implementations are constantly evolving and Summit is advertised as having only partial OpenMP 4.x support. This is a synergistic effort to help identify and fix bugs in features' implementations that are required by applications and prevent deployment delays later on. Going forward, we also plan to interact with standard benchmarking organizations like SPEC/HPG to donate our tests and mini-apps/kernels for potential inclusion in the next release versions of SPEC benchmark suite. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 22 条
  • [1] [Anonymous], 1986, TECHNICAL REPORT
  • [2] Bercea G., 2015, P 6 INT WORKSH PERF, P2
  • [3] Bull J.M., 2012, INT WORKSH OPENMP, P271, DOI [DOI 10.1007/978-3-642-30961-8_24, 10.1007/978-]
  • [4] GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5
    Clay, M. P.
    Buaria, D.
    Yeung, P. K.
    Gotoh, T.
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2018, 228 : 100 - 114
  • [5] Clay M.P., 2017, Improving scalability and accelerating petascale turbulence simulations using openmp
  • [6] Diaz J. M., 2018, P 47 INT C PARALLEL, P31
  • [7] PARALLEL LOOPS - A TEST SUITE FOR PARALLELIZING COMPILERS - DESCRIPTION AND EXAMPLE RESULTS
    DONGARRA, J
    FURTNEY, M
    REINHARDT, S
    RUSSELL, J
    [J]. PARALLEL COMPUTING, 1991, 17 (10-11) : 1247 - 1255
  • [8] Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
    Edwards, H. Carter
    Trott, Christian R.
    Sunderland, Daniel
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (12) : 3202 - 3216
  • [9] Friedline G. Kyle, 2017, P P3MA WORKSH COL IS
  • [10] From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives
    Juckeland, Guido
    Hernandez, Oscar
    Jacob, Arpith C.
    Neilson, Daniel
    Larrea, Veronica G. Vergara
    Wienke, Sandra
    Bobyr, Alexander
    Brantley, William C.
    Chandrasekaran, Sunita
    Colgrove, Mathew
    Grund, Alexander
    Henschel, Robert
    Joubert, Wayne
    Mueller, Matthias S.
    Raddatz, Dave
    Shelepugin, Pavel
    Whitney, Brian
    Wang, Bo
    Kumaran, Kalyan
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 470 - 488