TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA

被引:0
|
作者
Matsuo, Reoma [1 ]
Koizumi, Toru [1 ]
Irie, Hidetsugu [1 ]
Sakai, Shuichi [1 ]
Shioya, Ryota [1 ]
机构
[1] Univ Tokyo, Tokyo 1138654, Japan
关键词
Registers; Out of order; Graphics processing units; Relays; Microarchitecture; Dynamic scheduling; Decoding; Energy efficiency; GPU; instruction-level parallelism; microarchitecture; out-of-order execution;
D O I
10.1109/LCA.2023.3289317
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A graphics processing unit (GPU) is a processor that achieves high throughput by exploiting data parallelism. We found that many GPU workloads also contain instruction-level parallelism that can be extracted through out-of-order execution to provide additional performance improvement opportunities. We propose the TURBULENCE architecture for very low-cost out-of-order execution on GPUs. TURBULENCE consists of a novel ISA that introduces the concept of referencing operands by inter-instruction distance instead of register numbers, and a novel microarchitecture that executes the novel ISA. This distance-based operand has the property of not causing false dependencies. By exploiting this property, we achieve cost-effective out-of-order execution on GPUs without introducing expensive hardware such as a rename logic and a load-store queue. Simulation results show that TURBULENCE improves performance by 17.6% without increasing energy consumption over an existing GPU.
引用
收藏
页码:175 / 178
页数:4
相关论文
共 6 条
  • [1] TURBULENCE: Complexity-effective Out-of-order Execution on GPU with Distance-based ISA
    Matsuo, Reoma
    Koizumi, Toru
    Irie, Hidetsugu
    Sakai, Shuichi
    Shioya, Ryota
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [2] A Complexity-Effective Out-of-Order Retirement Microarchitecture
    Petit Marti, Salvador
    Sahuquillo Borras, Julio
    Lopez Rodriguez, Pedro
    Ubal Tena, Rafael
    Duato Marin, Jose
    IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (12) : 1626 - 1639
  • [3] Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution
    Iliakis, Konstantinos
    Xydis, Sotirios
    Soudris, Dimitrios
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 388 - 402
  • [4] LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution
    Iliakis, Konstantinos
    Xydis, Sotirios
    Soudris, Dimitrios
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (02) : 166 - 169
  • [5] An effective out-of-order execution control scheme for an embedded floating point coprocessor
    Jeong, CH
    Park, WC
    Han, TD
    Yang, SB
    Lee, MK
    MICROPROCESSORS AND MICROSYSTEMS, 2003, 27 (04) : 171 - 180
  • [6] Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams
    Mencagli, Gabriele
    Dazzi, Patrizio
    Coppola, Massimo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (09) : 1657 - 1671