Analysis of OpenMP 4.5 Offloading in Implementations: Correctness and Overhead

被引：17

作者：

Diaz, Jose Monsalve ^{[1
]}

Friedline, Kyle ^{[1
]}

Pophale, Swaroop ^{[2
]}

Hernandez, Oscar ^{[2
]}

Bernholdt, David E. ^{[2
]}

Chandrasekaran, Sunita ^{[1
]}

机构：

[1] Univ Delaware, 18 Amstel Ave, Newark, DE 19716 USA

[2] Oak Ridge Natl Lab, 1 Bethel Valley Rd, Oak Ridge, TN 37831 USA

来源：

PARALLEL COMPUTING | 2019年 / 89卷

关键词：

OpenMP; 4.5; Offloading; Overhead measurement; SUITE;

D O I：

10.1016/j.parco.2019.102546

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The OpenMP language features have been evolving to meet the rapid development in hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 target offload features in compilers such as Clang, XL and GCC that are an integral part of the software harness on supercomputers and clusters. We use Summit (Top supercomputer in the world as of November 2018) as one of our experimental setup. Such an effort is particularly critical on such supercomputers as that is being widely used by application developers to run their scientific codes at scale. Our tests not only evaluate the OpenMP implementations but also expose ambiguities within the OpenMP 4.5 specification. We also assess the overhead of the different OpenMP runtimes in relationship to the different directives and clauses. This helps in assessing the interaction of different OpenMP directives independent of other application artifacts. We are aware that the implementations are constantly evolving and Summit is advertised as having only partial OpenMP 4.x support. This is a synergistic effort to help identify and fix bugs in features' implementations that are required by applications and prevent deployment delays later on. Going forward, we also plan to interact with standard benchmarking organizations like SPEC/HPG to donate our tests and mini-apps/kernels for potential inclusion in the next release versions of SPEC benchmark suite. (C) 2019 Elsevier B.V. All rights reserved.

引用

页数：13

共 22 条

[1]

[Anonymous], 1986, TECHNICAL REPORT

[2]

Bercea G., 2015, P 6 INT WORKSH PERF, P2

[3]

Bull J.M., 2012, INT WORKSH OPENMP, P271, DOI [DOI 10.1007/978-3-642-30961-8_24, 10.1007/978-]

[4] GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5 [J].

Clay, M. P. ;

Buaria, D. ;

Yeung, P. K. ;

Gotoh, T. .

COMPUTER PHYSICS COMMUNICATIONS, 2018, 228 :100-114

[5]

Clay M.P., 2017, Improving scalability and accelerating petascale turbulence simulations using openmp

[6]

Diaz J. M., 2018, P 47 INT C PARALLEL, P31

[7] PARALLEL LOOPS - A TEST SUITE FOR PARALLELIZING COMPILERS - DESCRIPTION AND EXAMPLE RESULTS [J].

DONGARRA, J ;

FURTNEY, M ;

REINHARDT, S ;

RUSSELL, J .

PARALLEL COMPUTING, 1991, 17 (10-11) :1247-1255

[8] Kokkos: Enabling manycore performance portability through polymorphic memory access patterns [J].

Edwards, H. Carter ;

Trott, Christian R. ;

Sunderland, Daniel .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (12) :3202-3216

[9]

Friedline G. Kyle, 2017, P P3MA WORKSH COL IS

[10] From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives [J].

Juckeland, Guido ;

Hernandez, Oscar ;

Jacob, Arpith C. ;

Neilson, Daniel ;

Larrea, Veronica G. Vergara ;

Wienke, Sandra ;

Bobyr, Alexander ;

Brantley, William C. ;

Chandrasekaran, Sunita ;

Colgrove, Mathew ;

Grund, Alexander ;

Henschel, Robert ;

Joubert, Wayne ;

Mueller, Matthias S. ;

Raddatz, Dave ;

Shelepugin, Pavel ;

Whitney, Brian ;

Wang, Bo ;

Kumaran, Kalyan .

HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 :470-488

← 1 2 3 →