Analysis of OpenMP 4.5 Offloading in Implementations: Correctness and Overhead

被引:14
作者
Diaz, Jose Monsalve [1 ]
Friedline, Kyle [1 ]
Pophale, Swaroop [2 ]
Hernandez, Oscar [2 ]
Bernholdt, David E. [2 ]
Chandrasekaran, Sunita [1 ]
机构
[1] Univ Delaware, 18 Amstel Ave, Newark, DE 19716 USA
[2] Oak Ridge Natl Lab, 1 Bethel Valley Rd, Oak Ridge, TN 37831 USA
关键词
OpenMP; 4.5; Offloading; Overhead measurement; SUITE;
D O I
10.1016/j.parco.2019.102546
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The OpenMP language features have been evolving to meet the rapid development in hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 target offload features in compilers such as Clang, XL and GCC that are an integral part of the software harness on supercomputers and clusters. We use Summit (Top supercomputer in the world as of November 2018) as one of our experimental setup. Such an effort is particularly critical on such supercomputers as that is being widely used by application developers to run their scientific codes at scale. Our tests not only evaluate the OpenMP implementations but also expose ambiguities within the OpenMP 4.5 specification. We also assess the overhead of the different OpenMP runtimes in relationship to the different directives and clauses. This helps in assessing the interaction of different OpenMP directives independent of other application artifacts. We are aware that the implementations are constantly evolving and Summit is advertised as having only partial OpenMP 4.x support. This is a synergistic effort to help identify and fix bugs in features' implementations that are required by applications and prevent deployment delays later on. Going forward, we also plan to interact with standard benchmarking organizations like SPEC/HPG to donate our tests and mini-apps/kernels for potential inclusion in the next release versions of SPEC benchmark suite. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 22 条
  • [11] Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL
    Juckeland, Guido
    Grund, Alexander
    Nagel, Wolfgang E.
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 689 - 698
  • [12] SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance
    Juckeland, Guido
    Brantley, William
    Chandrasekaran, Sunita
    Chapman, Barbara
    Che, Shuai
    Colgrove, Mathew
    Feng, Huiyu
    Grund, Alexander
    Henschel, Robert
    Hwu, Wen-Mei W.
    Li, Huian
    Mueller, Matthias S.
    Nagel, Wolfgang E.
    Perminov, Maxim
    Shelepugin, Pavel
    Skadron, Kevin
    Stratton, John
    Titov, Alexey
    Wang, Ke
    van Waveren, Matthijs
    Whitney, Brian
    Wienke, Sandra
    Xu, Rengan
    Kumaran, Kalyan
    [J]. HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING, AND SIMULATION, 2015, 8966 : 46 - 67
  • [13] Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model
    Martineau, Matt
    McIntosh-Smith, Simon
    Gaudin, Wayne
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 338 - 347
  • [14] Muller M., 2003, P 5 EUR WORKSH OPENM
  • [15] Muller M.S., 2004, 6 EUROPEAN WORKSHOP
  • [16] Peluso PR, 2018, COMPUT SOC SCI, P31, DOI 10.1007/978-3-319-76765-9_3
  • [17] Pophale SwaroopSuhas., 2013, Proceedings of the Seventh Conference on Partitioned Global Address Space Programming Model (PGAS 2013), P257
  • [18] Reid F. J., 2004, PROC EWOMP, P63
  • [19] Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury
    Richards, David F.
    Bleile, Ryan C.
    Brantley, Patrick S.
    Dawson, Shawn A.
    McKinley, Michael Scott
    O'Brien, Matthew J.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 866 - 873
  • [20] Wang Cheng., 2012, OpenMP in a Heterogeneous World, P237