A User-guided locking API for the OpenMP* application program interface

被引:4
作者
Bae, Hansang [1 ]
Cownie, James [2 ]
Klemm, Michael [3 ]
Terboven, Christian [4 ]
机构
[1] Software and Services Group, Intel Corporation, Champaign, IL
[2] Software and Services Group, Intel Corporation (UK) Ltd., Bristol
[3] Software and Services Group, Intel GmbH, Feldkirchen
[4] IT Center, RWTH Aachen University, Aachen
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8766卷
关键词
Intel TSX; Lock elision; Locks; OpenMP; Speculative locks;
D O I
10.1007/978-3-319-11454-5_13
中图分类号
学科分类号
摘要
Although the OpenMP API specification defines a set of runtime routines for simple and nested locks, there is no standardized way to select different lock implementations. Programmers have to use vendor extensions to globally alter the lock implementation for the application; fine-grained control is not possible. Proper use of hardware-based speculative locks can achieve significant runtime improvements but, if used inappropriately, they can lead to severe performance penalties. Thus programmers need to be able to explicitly choose the right lock implementation on a per-lock basis. In this paper, we extend the OpenMP API for locks with functions to provide such hints to the implementation. We also extend the syntax and semantics of the critical construct with clauses to contain hints. Our performance results for micro-benchmarks show that the runtime selection of lock implementations does not add any noticeable overhead.We also show that using an appropriate runtime hint can improve application performance. © 2014 Springer International Publishing Switzerland.
引用
收藏
页码:173 / 186
页数:13
相关论文
共 15 条
[1]  
Bihari B.L., Wong M., Wang A., de Supinski B.R., Chen W., A Case for Including Transactions in OpenMP II: Hardware Transactional Memory, IWOMP 2012. LNCS, 7312, pp. 44-58, (2012)
[2]  
Bull J.M., Measuring Synchronisation and Scheduling Overheads in OpenMP, Proc. of the 1st EuropeanWorkshop on OpenMP, pp. 99-105, (1999)
[3]  
Drepper U., Futexes are Tricky, (2011)
[4]  
Drepper U., Molnar I., The Native POSIX Thread Library for Linux, (2003)
[5]  
Feng H., Van der Wijngaart R.F., Biswas R., Mavriplis C., Unstructured Adaptive (UA) NAS Parallel Benchmark, Version 1.0, (2004)
[6]  
Haring R.A., Ohmacht M., Fox T.W., Gschwind M.K., Satterfield D.L., Sugavanam K., Coteus P.W., Heidelberger P., Blumrich M.A., Wisniewski R.W., Gara A., Chiu G.L.-T., Boyle P.A., Christ N.H., Kim C., The IBM Blue Gene/Q Compute Chip, IEEE Micro, 32, 2, pp. 48-60, (2013)
[7]  
IBM XL C/C++ for Blue Gene/Q, V12.1, (2012)
[8]  
Intel® Architecture Instruction Set Extensions Programming Reference, (2012)
[9]  
Kleen A., Lock Elision in the GNU C Library, 12, 1, (2013)
[10]  
Mellor-Crummey J.M., Scott M.L., Algorithms for Scalable Synchronization on Shared-memory Multiprocessors, ACM Trans. Comput. Syst., 9, 1, pp. 21-65, (1991)