GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability

被引:5
|
作者
Lee H. [1 ]
Al Faruque M.A. [1 ]
机构
[1] Department of Electrical Engineering and Computer Science, University of California, Irvine, 92697, CA
关键词
compiler; GPGPU; instruction scheduling; reliability; soft-error;
D O I
10.1109/TMSCS.2017.2667661
中图分类号
学科分类号
摘要
The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application. © 2015 IEEE.
引用
收藏
页码:86 / 99
页数:13
相关论文
共 50 条
  • [41] Soft Error Aware Pipelined Architecture: Leveraging Automatic Repeat Request Protocol
    Tangellapalli, Phani Balaji Swamy
    Hasan, Syed Rafay
    2013 IEEE 56TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2013, : 549 - 552
  • [42] A Program-Aware Fault-Injection Method for Dependability Evaluation Against Soft-Error Using Genetic Algorithm
    Arasteh, Bahman
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2018, 27 (09)
  • [43] CASH: correlation-aware scheduling to mitigate soft error impact on heterogeneous multicores
    Jiao, Jiajia
    Wang, Libao
    Li, Yanxiang
    Han, Dezhi
    Yao, Min
    Li, Kuan-Ching
    Jiang, Hai
    CONNECTION SCIENCE, 2021, 33 (02) : 113 - 135
  • [44] Soft Reliability Aware Scheduling of Real-time Applications on Cloud with MTTF constraints
    Ghose, Manojit
    Pandey, Krishna Prabin
    Chaudhari, Niyati
    Sahu, Aryabartta
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 459 - 468
  • [45] Design and Heavy-Ion Testing of MTJ/CMOS Hybrid LSIs for Space-Grade Soft-Error Reliability
    Watanabe, K.
    Shimada, T.
    Hirose, K.
    Shindo, H.
    Kobayashi, D.
    Tanigawa, T.
    Ikeda, S.
    Shinada, T.
    Koike, H.
    Endoh, T.
    Makino, T.
    Ohshima, T.
    2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
  • [46] Design and Heavy-Ion Testing of MTJ/CMOS Hybrid LSIs for Space-Grade Soft-Error Reliability
    Watanabe, K.
    Shimada, T.
    Hirose, K.
    Shindo, H.
    Kobayashi, D.
    Tanigawa, T.
    Ikeda, S.
    Shinada, T.
    Koike, H.
    Endoh, T.
    Makino, T.
    Ohshima, T.
    2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
  • [47] Soft and Hard Reliability-Aware Scheduling for Multicore Embedded Systems with Energy Harvesting
    Xiang, Yi
    Pasricha, Sudeep
    IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2015, 1 (04): : 220 - 235
  • [48] Improving Reliability of Soft Real-Time Embedded Systems on Integrated CPU and GPU Platforms
    Ma, Yue
    Zhou, Junlong
    Chantem, Thidapat
    Dick, Robert P.
    Wang, Shige
    Hu, Xiaobo Sharon
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (10) : 2218 - 2229
  • [49] Variation-Aware Task Allocation and Scheduling for Improving Reliability of Real-Time MPSoCs
    Zhou, Junlong
    Wei, Tongquan
    Chen, Mingsong
    Hu, X. Sharon
    Ma, Yue
    Zhang, Gongxuan
    Yan, Jianming
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 171 - 176
  • [50] Partial TMR for Improving the Soft Error Reliability of SRAM-Based FPGA Designs
    Keller, Andrew M.
    Wirthlin, Michael J.
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2021, 68 (05) : 1023 - 1031