From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives

被引:13
作者
Juckeland, Guido [1 ,2 ]
Hernandez, Oscar [1 ,3 ]
Jacob, Arpith C. [1 ,4 ]
Neilson, Daniel [1 ,5 ]
Larrea, Veronica G. Vergara [1 ,3 ]
Wienke, Sandra [1 ,6 ]
Bobyr, Alexander [1 ,7 ]
Brantley, William C. [1 ,8 ]
Chandrasekaran, Sunita [1 ,9 ]
Colgrove, Mathew [1 ,10 ]
Grund, Alexander [1 ,2 ]
Henschel, Robert [1 ,11 ]
Joubert, Wayne [1 ,3 ]
Mueller, Matthias S. [1 ,6 ]
Raddatz, Dave [1 ,12 ]
Shelepugin, Pavel [1 ,7 ]
Whitney, Brian [1 ,13 ]
Wang, Bo [1 ,6 ]
Kumaran, Kalyan [1 ,14 ]
机构
[1] SPEC HPG, Gainesville, FL 32611 USA
[2] HZDR, Dresden, Germany
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
[4] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[5] IBM Corp, Markham, ON, Canada
[6] Rhein Westfal TH Aachen, Aachen, Germany
[7] Intel, Nizhnii Novgorod, Russia
[8] AMD, Austin, TX USA
[9] Univ Delaware, Newark, DE USA
[10] NVIDIA, Santa Clara, CA USA
[11] Indiana Univ, Bloomington, IN USA
[12] SGI, Milpitas, CA USA
[13] Oracle, Redwood Shores, CA USA
[14] Argonne Natl Lab, Lemont, IL USA
来源
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS | 2016年 / 9945卷
关键词
SPEC; SPEC ACCEL; OpenMP; OpenACC; Offloading;
D O I
10.1007/978-3-319-46079-6_33
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify parallelism via directives so that compilers can generate device specific code and optimizations. However, the challenge of porting codes becomes more complex because of the different types of parallelism and memory hierarchies available on different architectures. In this paper we discuss our experience with porting the SPEC ACCEL benchmarks from OpenACC to OpenMP 4.5 using a performance portable style that lets the compiler make platform-specific optimizations to achieve good performance on a variety of systems. The ported SPEC ACCEL OpenMP benchmarks were validated on different platforms including Xeon Phi, GPUs and CPUs. We believe that this experience can help the community and compiler vendors understand how users plan to write OpenMP 4.5 applications in a performance portable style.
引用
收藏
页码:470 / 488
页数:19
相关论文
共 26 条
[1]   Targeting the Parallella [J].
Agathos, Spiros N. ;
Papadogiannakis, Alexandros ;
Dimakopoulos, Vassilios V. .
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 :662-674
[2]  
[Anonymous], 2012, INT WORKSHOP OPENMP
[3]  
[Anonymous], 2013, OpenMP Application Program Interface
[4]  
Bertolli C., 2015, P 2 WORKSHOP LLVM CO, DOI DOI 10.1145/2833157.2833161
[5]  
Bertolli C., 2014, COORDINATING GPU THR
[6]  
Calore E, 2014, LECT NOTES COMPUT SC, V8806, P438, DOI 10.1007/978-3-319-14313-2_37
[7]  
Chunhua Liao, 2013, OpenMP in the Era of Low Power Devices and Accelerators. 9th International Workshop on OpenMP, IWOMP 2013. Proceedings: LNCS 8122, P84, DOI 10.1007/978-3-642-40698-0_7
[8]  
Cray, 2015, S521284 CRAY
[9]  
Foundation F. S., 2016, GCC 6 REL SER CHANG
[10]  
Herdman J. A., 2014, 2014 First Workshop on Accelerator Programming using Directives (WACCPD), P19, DOI 10.1109/WACCPD.2014.10