Understanding Error Propagation in GPGPU Applications

被引:0
|
作者
Li, Guanpeng [1 ]
Pattabiraman, Karthik [1 ]
Cher, Chen-Yong [2 ]
Bose, Pradip [2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] IBM TJ Watson Res Ctr, New York, NY USA
来源
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2016年
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会;
关键词
Fault Injection; Error Resilience; GPGPU; CUDA; Error Propagation; RESILIENCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation in them. We perform an empirical study to understand and characterize error propagation in GPU applications. We build a compiler-based fault-injection tool for GPU applications to track error propagation, and define metrics to characterize propagation in GPU applications. We find GPU applications exhibit significant error propagation for some kinds of errors, but not others, and the behaviour is highly application specific. We observe the GPU-CPU interaction boundary naturally limits error propagation in these applications compared to traditional non-GPU applications. We also formulate various guidelines for the design of fault-tolerance mechanisms in GPU applications based on our results.
引用
收藏
页码:240 / 251
页数:12
相关论文
共 50 条
  • [31] ERROR IN THE PROPAGATION OF ERROR FORMULA
    ASBJORNSEN, OA
    AICHE JOURNAL, 1986, 32 (02) : 332 - 334
  • [32] Understanding Calibration and Error Propagation in Longitudinal and Lateral Manganin Gauge Shock Experiments
    Jordan, J. L.
    Casem, D. T.
    JOURNAL OF DYNAMIC BEHAVIOR OF MATERIALS, 2021, 7 (02) : 188 - 195
  • [33] An Application Framework for Migrating GPGPU Cloud Applications
    Yuhara, Sho
    Suzuki, Yusuke
    Kono, Kenji
    2018 16TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2018), 2018, : 62 - 66
  • [34] GPGPU-Perf: efficient, interval-based DVFS algorithm for mobile GPGPU applications
    Kim, SeongKi
    Kim, Young J.
    VISUAL COMPUTER, 2015, 31 (6-8): : 1045 - 1054
  • [35] Error Propagation
    Arvind Singh
    Priyanka Chaturvedi
    Resonance, 2021, 26 : 853 - 861
  • [36] PROPAGATION OF ERROR
    NELSON, LS
    JOURNAL OF QUALITY TECHNOLOGY, 1992, 24 (04) : 232 - 234
  • [37] Error Propagation
    Singh, Arvind
    Chaturvedi, Priyanka
    RESONANCE-JOURNAL OF SCIENCE EDUCATION, 2021, 26 (06): : 853 - 861
  • [38] Scaled Unscented Transformation of Nonlinear Error Propagation: Accuracy, Sensitivity, and Applications
    Wang, Leyang
    Zhao, Yingwen
    JOURNAL OF SURVEYING ENGINEERING, 2018, 144 (01)
  • [39] Orchestrating Cache Management and Memory Scheduling for GPGPU Applications
    Mu, Shuai
    Deng, Yandong
    Chen, Yubei
    Li, Huaiming
    Pan, Jianming
    Zhang, Wenjun
    Wang, Zhihua
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2014, 22 (08) : 1803 - 1814
  • [40] GPGPU Accelerated Simulation and Parameter Tuning for Neuromorphic Applications
    Carlson, Kristofor D.
    Beyeler, Michael
    Dutt, Nikil
    Krichmar, Jeffrey L.
    2014 19TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2014, : 570 - 577