GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability

被引:5
|
作者
Lee H. [1 ]
Al Faruque M.A. [1 ]
机构
[1] Department of Electrical Engineering and Computer Science, University of California, Irvine, 92697, CA
关键词
compiler; GPGPU; instruction scheduling; reliability; soft-error;
D O I
10.1109/TMSCS.2017.2667661
中图分类号
学科分类号
摘要
The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application. © 2015 IEEE.
引用
收藏
页码:86 / 99
页数:13
相关论文
共 50 条
  • [1] PAIS: Parallelization Aware Instruction Scheduling for Improving Soft-error Reliability of GPU-based Systems
    Lee, Haeseung
    Chen, Hsinchung
    Al Faruque, Mohammad Abdullah
    PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 1568 - 1573
  • [2] Modeling Soft-Error Reliability Under Variability
    Balakrishnan, Aneesh
    Medeiros, Guilherme Cardoso
    Gursoy, Cemil Cem
    Hamdioui, Said
    Jenihhin, Maksim
    Alexandrescu, Dan
    34TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT 2021), 2021,
  • [3] Soft-error reliable architecture for future microprocessors
    Gopalakrishnan, Shoba
    Singh, Virendra
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2019, 13 (03): : 233 - 242
  • [4] Resource Management for Improving Soft-Error and Lifetime Reliability of Real-Time MPSoCs
    Zhou, Junlong
    Sun, Jin
    Zhou, Xiumin
    Wei, Tongquan
    Chen, Mingsong
    Hu, Shiyan
    Hu, Xiaobo Sharon
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (12) : 2215 - 2228
  • [5] Modeling and optimization for soft-error reliability of sequential circuits
    Miskov-Zivanov, Natasa
    Marculescu, Diana
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2008, 27 (05) : 803 - 816
  • [6] RELIABILITY OF SEMICONDUCTOR RAMS WITH SOFT-ERROR SCRUBBING TECHNIQUES
    YANG, GC
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 1995, 142 (05): : 337 - 344
  • [7] UnSync-CMP: Multicore CMP Architecture for Energy-Efficient Soft-Error Reliability
    Jeyapaul, Reiley
    Hong, Fei
    Rhisheekesan, Abhishek
    Shrivastava, Aviral
    Lee, Kyoungwoo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 254 - 263
  • [8] Improving Testability and Soft-Error Resilience through Retiming
    Krishnaswamy, Smita
    Markov, Igor L.
    Hayes, John P.
    DAC: 2009 46TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2009, : 508 - +
  • [9] Improving soft-error tolerance of FPGA configuration bits
    Srinivasan, S
    Gayasen, A
    Vijaykrishnan, N
    Kandemir, M
    Xie, Y
    Irwin, MJ
    ICCAD-2004: INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, IEEE/ACM DIGEST OF TECHNICAL PAPERS, 2004, : 107 - 110
  • [10] Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File
    Mittal, Sparsh
    Wang, Haonan
    Jog, Adwait
    Vetter, Jeffrey S.
    2017 30TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2017 16TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID 2017), 2017, : 409 - 414