Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution

被引：4

作者：

Iliakis, Konstantinos ^{[1
]}

Xydis, Sotirios ^{[3
]}

Soudris, Dimitrios ^{[2
]}

机构：

[1] Natl Tech Univ Athens, Athens 15780, Greece

[2] Natl Tech Univ Athens, Dept Comp Sci, Sch Elect & Comp Engn, Athens 15780, Greece

[3] Harokopio Univ Athens, Dept Informat & Telemat, Athens 17671, Greece

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 02期

基金：

欧盟地平线“2020”;

关键词：

Graphics processing units; Kernel; Out of order; Computer architecture; Parallel processing; Context; Resource management; General purpose GPU; micro-architecture; out-of-order execution; instruction level parallelism; parallel systems;

D O I：

10.1109/TPDS.2021.3093231

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms traditional platforms by 23 percent on average for low-occupancy kernels, with an area and power overhead of 1.29 and 10.05 percent, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16 percent speedup over a wide collection of 60 general-purpose kernels.

引用

页码：388 / 402

页数：15

共 29 条

[1] LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution
Iliakis, Konstantinos
Xydis, Sotirios
Soudris, Dimitrios
IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (02) : 166 - 169
[2] TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA
Matsuo, Reoma
Koizumi, Toru
Irie, Hidetsugu
Sakai, Shuichi
Shioya, Ryota
IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 175 - 178
[3] HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution
Gong, Xun
Gong, Xiang
Yu, Leiming
Kaeli, David
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (02)
[4] TURBULENCE: Complexity-effective Out-of-order Execution on GPU with Distance-based ISA
Matsuo, Reoma
Koizumi, Toru
Irie, Hidetsugu
Sakai, Shuichi
Shioya, Ryota
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[5] INTERRUPT HANDLING FOR OUT-OF-ORDER EXECUTION PROCESSORS
TORNG, HC
DAY, M
IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (01) : 122 - 127
[6] The implementation of an out-of-order execution floating point unit
Luo, M
Bai, YQ
Shen, XB
Gao, DY
2004: 7TH INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUITS TECHNOLOGY, VOLS 1- 3, PROCEEDINGS, 2004, : 1384 - 1387
[7] Formal Verification of Out-of-Order Execution with Incremental Flushing
Robert B. Jones
Jens U. Skakkebæk
David L. Dill
Formal Methods in System Design, 2002, 20 : 139 - 158
[8] Symbolic Predictive Cache Analysis for Out-of-Order Execution
Huang, Zunchen
Wang, Chao
FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING, FASE 2022, 2022, 13241 : 163 - 183
[9] Automatic Refinement Checking of Pipelines with Out-of-Order Execution
Srinivasan, Sudarshan K.
IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (08) : 1138 - 1144
[10] Formal verification of out-of-order execution with incremental flushing
Jones, RB
Skakkebæk, JU
Dill, DL
FORMAL METHODS IN SYSTEM DESIGN, 2002, 20 (02) : 139 - 158

← 1 2 3 →