Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution

被引:4
|
作者
Iliakis, Konstantinos [1 ]
Xydis, Sotirios [3 ]
Soudris, Dimitrios [2 ]
机构
[1] Natl Tech Univ Athens, Athens 15780, Greece
[2] Natl Tech Univ Athens, Dept Comp Sci, Sch Elect & Comp Engn, Athens 15780, Greece
[3] Harokopio Univ Athens, Dept Informat & Telemat, Athens 17671, Greece
基金
欧盟地平线“2020”;
关键词
Graphics processing units; Kernel; Out of order; Computer architecture; Parallel processing; Context; Resource management; General purpose GPU; micro-architecture; out-of-order execution; instruction level parallelism; parallel systems;
D O I
10.1109/TPDS.2021.3093231
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms traditional platforms by 23 percent on average for low-occupancy kernels, with an area and power overhead of 1.29 and 10.05 percent, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16 percent speedup over a wide collection of 60 general-purpose kernels.
引用
收藏
页码:388 / 402
页数:15
相关论文
共 29 条
  • [1] LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution
    Iliakis, Konstantinos
    Xydis, Sotirios
    Soudris, Dimitrios
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (02) : 166 - 169
  • [2] TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA
    Matsuo, Reoma
    Koizumi, Toru
    Irie, Hidetsugu
    Sakai, Shuichi
    Shioya, Ryota
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 175 - 178
  • [3] HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution
    Gong, Xun
    Gong, Xiang
    Yu, Leiming
    Kaeli, David
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (02)
  • [4] TURBULENCE: Complexity-effective Out-of-order Execution on GPU with Distance-based ISA
    Matsuo, Reoma
    Koizumi, Toru
    Irie, Hidetsugu
    Sakai, Shuichi
    Shioya, Ryota
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [5] INTERRUPT HANDLING FOR OUT-OF-ORDER EXECUTION PROCESSORS
    TORNG, HC
    DAY, M
    IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (01) : 122 - 127
  • [6] The implementation of an out-of-order execution floating point unit
    Luo, M
    Bai, YQ
    Shen, XB
    Gao, DY
    2004: 7TH INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUITS TECHNOLOGY, VOLS 1- 3, PROCEEDINGS, 2004, : 1384 - 1387
  • [7] Formal Verification of Out-of-Order Execution with Incremental Flushing
    Robert B. Jones
    Jens U. Skakkebæk
    David L. Dill
    Formal Methods in System Design, 2002, 20 : 139 - 158
  • [8] Symbolic Predictive Cache Analysis for Out-of-Order Execution
    Huang, Zunchen
    Wang, Chao
    FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING, FASE 2022, 2022, 13241 : 163 - 183
  • [9] Automatic Refinement Checking of Pipelines with Out-of-Order Execution
    Srinivasan, Sudarshan K.
    IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (08) : 1138 - 1144
  • [10] Formal verification of out-of-order execution with incremental flushing
    Jones, RB
    Skakkebæk, JU
    Dill, DL
    FORMAL METHODS IN SYSTEM DESIGN, 2002, 20 (02) : 139 - 158