Using hardware-transactional-memory support to implement speculative task execution

被引:0
|
作者
Salamanca, Juan [1 ]
Baldassin, Alexandro [2 ]
机构
[1] Univ Campinas UNICAMP, Campinas, Brazil
[2] Sao Paulo State Univ Unesp, Dept Stat Appl Math & Comp DEMAC IGCE, Sao Paulo, Brazil
关键词
Speculative task execution; Hardware transactional memory; Speculative taskloop; LEVEL SPECULATION; PRIVATIZATION;
D O I
10.1016/j.jpdc.2024.104939
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot -code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP taskloop construct ( Speculative Taskloop (STL) ). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot -code regions, speed-ups of up to 5.39x can be obtained in IBM POWER8 and of up to 2.41x in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17 .70x in IBM POWER9 processor using 20 cores.
引用
收藏
页数:19
相关论文
共 50 条
  • [32] Mimosa: Protecting Private Keys Against Memory Disclosure Attacks Using Hardware Transactional Memory
    Li, Congwu
    Le Guan
    Lin, Jingqiang
    Luo, Bo
    Cai, Quanwei
    Jing, Jiwu
    Wang, Jing
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (03) : 1196 - 1213
  • [33] ASeD: Availability, Security, and Debugging Support using Transactional Memory
    Chung, JaeWoong
    Baek, Woongki
    Bronson, Nathan Grasso
    Seo, Jiwon
    Kozyrakis, Christos
    Olukotun, Kunle
    SPAA'08: PROCEEDINGS OF THE TWENTIETH ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2008, : 366 - 366
  • [34] Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory
    Shanbhag, Anil
    Pirk, Holger
    Madden, Sam
    DATA MANAGEMENT ON NEW HARDWARE, 2017, 10195 : 118 - 133
  • [35] Split Hardware Transactions True nesting of transactions using best-effort hardware transactional memory
    Lev, Yossi
    Maessen, Jan-Willem
    PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2008, : 197 - 206
  • [36] NoMap: Speeding-Up Java']JavaScript Using Hardware Transactional Memory
    Shull, Thomas
    Choi, Jiho
    Garzaran, Maria J.
    Torrellas, Josep
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 412 - 425
  • [37] TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory
    Zhang, Tong
    Lee, Dongyoon
    Jung, Changhee
    ACM SIGPLAN NOTICES, 2016, 51 (04) : 159 - 173
  • [38] Hardware support for extracting coarse-grain speculative parallelism in distributed shared-memory multiprocessors
    Figueiredo, RJ
    Fortes, JAB
    PROCEEDINGS OF THE 2001 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2001, : 214 - 223
  • [39] Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory
    Gruss, Daniel
    Lettner, Julian
    Schuster, Felix
    Ohrimenko, Olga
    Haller, Istvan
    Costa, Manuel
    PROCEEDINGS OF THE 26TH USENIX SECURITY SYMPOSIUM (USENIX SECURITY '17), 2017, : 217 - 233
  • [40] Using Hardware Transactional Memory to Correct and Simplify a Readers-Writer Lock Algorithm
    Dice, Dave
    Lev, Yossi
    Luchangco, Victor
    Moir, Mark
    Liu, Yujie
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 261 - 270