GPU-based First Aid for System Faults

被引:0
|
作者
Kimura, Kento [1 ]
Kourai, Kenichi [1 ]
机构
[1] Kyushu Inst Technol, Iizuka, Fukuoka, Japan
关键词
fault recovery; GPUs; signals; scheduling; deadlocks;
D O I
10.1145/3546591.3547526
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
It is difficult to completely avoid system failures in recent large-scale and complex systems. Therefore, it is important to detect system faults rapidly and accurately and recover from them. Fault recovery is categorized into external one from remote hosts and internal one with processes or the operating system (OS) inside a target system. However, both methods are subject to system faults. If fault recovery fails, a hardware reset is required and can lead to losing system data and states. This paper proposes GPUfas for recovering from system faults by indirectly controlling OS behavior from a GPU, which is not easily affected by system faults. GPUfas attempts fault recovery by rewriting OS data in main memory and leveraging the capabilities of the OS itself. For example, it can mimic signal sending and process scheduling to force termination of the processes that consume excessive resources. It can also mimic unlocking to recover from some kind of deadlock. We have implemented GPUfas using the Linux kernel, CUDA, and LLVM to enable a GPU to rewrite OS data transparently. Then, we confirmed the effectiveness and efficiency of fault recovery by GPUfas.
引用
收藏
页码:38 / 45
页数:8
相关论文
共 50 条
  • [1] GPU-Based Discrete Element Modeling of Geological Faults
    Lisita, Vadim
    Kolyukhin, Dmitriy
    Tcheverda, Vladimir
    Volianskaia, Victoria
    Priimenko, Viatcheslav
    SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 225 - 236
  • [2] A GPU-Based Fault Simulator for Small-Delay Faults
    Peng Mingming
    Kuang Jishun
    MATERIALS PROCESSING AND MANUFACTURING III, PTS 1-4, 2013, 753-755 : 2235 - +
  • [3] GPU-based Parallelization of System Modeling
    Pachnicke, S.
    2013 OPTICAL FIBER COMMUNICATION CONFERENCE AND EXPOSITION AND THE NATIONAL FIBER OPTIC ENGINEERS CONFERENCE (OFC/NFOEC), 2013,
  • [4] A GPU-based Graph Pattern Mining System
    Hu, Lin
    Zou, Lei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4867 - 4871
  • [5] The GPU-based parallel Ant Colony System
    Skinderowicz, Rafal
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 98 : 48 - 60
  • [6] Concurrent query processing in a GPU-based database system
    Li, Hao
    Tu, Yi-Cheng
    Zeng, Bo
    PLOS ONE, 2019, 14 (04):
  • [7] GRAP: Efficient GPU-Based Redundancy Analysis Using Parallel Evaluation for Cross Faults
    Shin, Seung Ho
    Lee, Hayoung
    Kang, Sungho
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (08) : 2518 - 2531
  • [8] GPU-based composite subdivision
    LI Guiqing 1)
    Computer Aided Drafting,Design and Manufacturing, 2012, (03) : 50 - 60
  • [9] GPU-based Runtime Verification
    Berkovich, Shay
    Bonakdarpour, Borzoo
    Fischmeister, Sebastian
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1025 - 1036
  • [10] GPU-Based Multilevel Clustering
    Chiosa, Iurie
    Kolb, Andreas
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (02) : 132 - 145