Using hardware-transactional-memory support to implement speculative task execution

被引:0
作者
Salamanca, Juan [1 ]
Baldassin, Alexandro [2 ]
机构
[1] Univ Campinas UNICAMP, Campinas, Brazil
[2] Sao Paulo State Univ Unesp, Dept Stat Appl Math & Comp DEMAC IGCE, Sao Paulo, Brazil
关键词
Speculative task execution; Hardware transactional memory; Speculative taskloop; LEVEL SPECULATION; PRIVATIZATION;
D O I
10.1016/j.jpdc.2024.104939
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot -code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP taskloop construct ( Speculative Taskloop (STL) ). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot -code regions, speed-ups of up to 5.39x can be obtained in IBM POWER8 and of up to 2.41x in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17 .70x in IBM POWER9 processor using 20 cores.
引用
收藏
页数:19
相关论文
共 45 条
[1]  
ARM, 2022, Overview of ARM transactional memory extension
[2]   The Design of OpenMP Tasks [J].
Ayguade, Eduard ;
Copty, Nawal ;
Duran, Alejandro ;
Hoeflinger, Jay ;
Lin, Yuan ;
Massaioli, Federico ;
Teruel, Xavier ;
Unnikrishnan, Priya ;
Zhang, Guansong .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (03) :404-418
[3]  
Bihari B.L., 2012, INT WORKSH OPENMP IW, P44
[4]   Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives Don't Wait, Speculate! [J].
Bonnichsen, Lars ;
Podobas, Artur .
OPENMP: HETEROGENOUS EXECUTION AND DATA MOVEMENTS, IWOMP 2015, 2015, 9342 :149-161
[5]   Accelerating GPU Hardware Transactional Memory with Snapshot Isolation [J].
Chen, Sui ;
Peng, Lu ;
Irving, Samuel .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :282-294
[6]  
cTuning Foundation, 2016, cBench: Collective benchmarks.
[7]  
Cytron R., 1986, Proceedings of the 1986 International Conference on Parallel Processing (Cat. No.86CH2355-6), P836
[8]   Hybrid transactional memory [J].
Damron, Peter ;
Fedorova, Alexandra ;
Lev, Yossi ;
Luchangco, Victor ;
Moir, Mark ;
Nussbaum, Daniel .
ACM SIGPLAN NOTICES, 2006, 41 (11) :336-346
[9]   OnipSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES [J].
Duran, Alejandro ;
Ayguade, Eduard ;
Badia, Rosa M. ;
Labahta, Jesus ;
Martinell, Luis ;
Martorell, Xavier ;
Planas, Judit .
PARALLEL PROCESSING LETTERS, 2011, 21 (02) :173-193
[10]   Time-Based Software Transactional Memory [J].
Felber, Pascal ;
Fetzer, Christof ;
Marlier, Patrick ;
Riegel, Torvald .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (12) :1793-1807