Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution

被引:4
|
作者
Iliakis, Konstantinos [1 ]
Xydis, Sotirios [3 ]
Soudris, Dimitrios [2 ]
机构
[1] Natl Tech Univ Athens, Athens 15780, Greece
[2] Natl Tech Univ Athens, Dept Comp Sci, Sch Elect & Comp Engn, Athens 15780, Greece
[3] Harokopio Univ Athens, Dept Informat & Telemat, Athens 17671, Greece
基金
欧盟地平线“2020”;
关键词
Graphics processing units; Kernel; Out of order; Computer architecture; Parallel processing; Context; Resource management; General purpose GPU; micro-architecture; out-of-order execution; instruction level parallelism; parallel systems;
D O I
10.1109/TPDS.2021.3093231
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms traditional platforms by 23 percent on average for low-occupancy kernels, with an area and power overhead of 1.29 and 10.05 percent, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16 percent speedup over a wide collection of 60 general-purpose kernels.
引用
收藏
页码:388 / 402
页数:15
相关论文
共 29 条
  • [21] An effective out-of-order execution control scheme for an embedded floating point coprocessor
    Jeong, CH
    Park, WC
    Han, TD
    Yang, SB
    Lee, MK
    MICROPROCESSORS AND MICROSYSTEMS, 2003, 27 (04) : 171 - 180
  • [22] Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams
    Mencagli, Gabriele
    Dazzi, Patrizio
    Coppola, Massimo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (09) : 1657 - 1671
  • [23] Out-of-Order Execution in Sequentially Consistent Shared-Memory Systems:Theory and Experiments
    胡伟武
    water.chpc.ict.ac.cn
    夏培肃
    JournalofComputerScienceandTechnology, 1998, (02) : 125 - 140
  • [24] Exposing Cache Timing Side-Channel Leaks through Out-of-Order Symbolic Execution
    Guo, Shengjian
    Chen, Yueqi
    Yu, Jiyong
    Wu, Meng
    Zuo, Zhiqiang
    Li, Peng
    Cheng, Yueqiang
    Wang, Huibo
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2020, 4 (04):
  • [25] Evict+Spec+Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks
    Cheng, Shing Hing William
    Chuengsatiansup, Chitchanok
    Genkin, Daniel
    McNeil, Dallas
    Murray, Toby
    Yarom, Yuval
    Zhang, Zhiyuan
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (03): : 224 - 248
  • [26] Verification of FM9801: An Out-of-Order Microprocessor Model with Speculative Execution, Exceptions, and Program-Modifying Capability
    Jun Sawada
    Warren A. Hunt
    Formal Methods in System Design, 2002, 20 : 187 - 222
  • [27] Verification of FM9801: An out-of-order microprocessor model with speculative execution, exceptions, and program-modifying capability
    Sawada, J
    Hunt, WA
    FORMAL METHODS IN SYSTEM DESIGN, 2002, 20 (02) : 187 - 222
  • [28] Implementing a 1GHz four-issue out-of-order execution microprocessor in a standard cell ASIC methodology
    Hu, Wei-Wu
    Zhao, Ji-Ye
    Zhong, Shi-Qiang
    Yang, Xu
    Guidetti, Elio
    Wu, Chris
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2007, 22 (01) : 1 - 14
  • [29] Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology
    Wei-Wu Hu
    Ji-Ye Zhao
    Shi-Qiang Zhong
    Xu Yang
    Elio Guidetti
    Chris Wu
    Journal of Computer Science and Technology, 2007, 22 : 1 - 14