Analysis of OpenMP 4.5 Offloading in Implementations: Correctness and Overhead

被引：14

作者：

Diaz, Jose Monsalve ^{[1
]}

Friedline, Kyle ^{[1
]}

Pophale, Swaroop ^{[2
]}

Hernandez, Oscar ^{[2
]}

Bernholdt, David E. ^{[2
]}

Chandrasekaran, Sunita ^{[1
]}

机构：

[1] Univ Delaware, 18 Amstel Ave, Newark, DE 19716 USA

[2] Oak Ridge Natl Lab, 1 Bethel Valley Rd, Oak Ridge, TN 37831 USA

来源：

PARALLEL COMPUTING | 2019年 / 89卷

关键词：

OpenMP; 4.5; Offloading; Overhead measurement; SUITE;

D O I：

10.1016/j.parco.2019.102546

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The OpenMP language features have been evolving to meet the rapid development in hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 target offload features in compilers such as Clang, XL and GCC that are an integral part of the software harness on supercomputers and clusters. We use Summit (Top supercomputer in the world as of November 2018) as one of our experimental setup. Such an effort is particularly critical on such supercomputers as that is being widely used by application developers to run their scientific codes at scale. Our tests not only evaluate the OpenMP implementations but also expose ambiguities within the OpenMP 4.5 specification. We also assess the overhead of the different OpenMP runtimes in relationship to the different directives and clauses. This helps in assessing the interaction of different OpenMP directives independent of other application artifacts. We are aware that the implementations are constantly evolving and Summit is advertised as having only partial OpenMP 4.x support. This is a synergistic effort to help identify and fix bugs in features' implementations that are required by applications and prevent deployment delays later on. Going forward, we also plan to interact with standard benchmarking organizations like SPEC/HPG to donate our tests and mini-apps/kernels for potential inclusion in the next release versions of SPEC benchmark suite. (C) 2019 Elsevier B.V. All rights reserved.

引用

页数：13

共 22 条

[11] Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL
Juckeland, Guido
Grund, Alexander
Nagel, Wolfgang E.
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 689 - 698
[12] SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance
Juckeland, Guido
Brantley, William
Chandrasekaran, Sunita
Chapman, Barbara
Che, Shuai
Colgrove, Mathew
Feng, Huiyu
Grund, Alexander
Henschel, Robert
Hwu, Wen-Mei W.
Li, Huian
Mueller, Matthias S.
Nagel, Wolfgang E.
Perminov, Maxim
Shelepugin, Pavel
Skadron, Kevin
Stratton, John
Titov, Alexey
Wang, Ke
van Waveren, Matthijs
Whitney, Brian
Wienke, Sandra
Xu, Rengan
Kumaran, Kalyan
[J]. HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING, AND SIMULATION, 2015, 8966 : 46 - 67
[13] Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model
Martineau, Matt
McIntosh-Smith, Simon
Gaudin, Wayne
[J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 338 - 347
[14] Muller M., 2003, P 5 EUR WORKSH OPENM
[15] Muller M.S., 2004, 6 EUROPEAN WORKSHOP
[16] Peluso PR, 2018, COMPUT SOC SCI, P31, DOI 10.1007/978-3-319-76765-9_3
[17] Pophale SwaroopSuhas., 2013, Proceedings of the Seventh Conference on Partitioned Global Address Space Programming Model (PGAS 2013), P257
[18] Reid F. J., 2004, PROC EWOMP, P63
[19] Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury
Richards, David F.
Bleile, Ryan C.
Brantley, Patrick S.
Dawson, Shawn A.
McKinley, Michael Scott
O'Brien, Matthew J.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 866 - 873
[20] Wang Cheng., 2012, OpenMP in a Heterogeneous World, P237

← 1 2 3 →