Registers;
Out of order;
Graphics processing units;
Relays;
Microarchitecture;
Dynamic scheduling;
Decoding;
Energy efficiency;
GPU;
instruction-level parallelism;
microarchitecture;
out-of-order execution;
D O I:
10.1109/LCA.2023.3289317
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
A graphics processing unit (GPU) is a processor that achieves high throughput by exploiting data parallelism. We found that many GPU workloads also contain instruction-level parallelism that can be extracted through out-of-order execution to provide additional performance improvement opportunities. We propose the TURBULENCE architecture for very low-cost out-of-order execution on GPUs. TURBULENCE consists of a novel ISA that introduces the concept of referencing operands by inter-instruction distance instead of register numbers, and a novel microarchitecture that executes the novel ISA. This distance-based operand has the property of not causing false dependencies. By exploiting this property, we achieve cost-effective out-of-order execution on GPUs without introducing expensive hardware such as a rename logic and a load-store queue. Simulation results show that TURBULENCE improves performance by 17.6% without increasing energy consumption over an existing GPU.