Balancing Reliability, Cost, and Performance Tradeoffs with FreeFault

被引:0
作者
Kim, Dong Wan [1 ]
Erez, Mattan [1 ]
机构
[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
来源
2015 IEEE 21ST INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2015年
基金
美国国家科学基金会;
关键词
MEMORY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory errors have been a major source of system failures and fault rates may rise even further as memory continues to scale. This increasing fault rate, especially when combined with advent of integrated on-package memories, may exceed the capabilities of traditional fault tolerance mechanisms or significantly increase their overhead. In this paper, we present FreeFault as a hardware-only, transparent, and nearly-free resilience mechanism that is implemented entirely within a processor and can tolerate the majority of DRAM faults. FreeFault repurposes portions of the last-level cache for storing retired memory regions and augments a hardware memory scrubber to monitor memory health and aid retirement decisions. Because it relies on existing structures (cache associativity) for retirement/remapping type repair, FreeFault has essentially no hardware overhead. Because it requires a very modest portion of the cache (as small as 8KB) to cover a large fraction of DRAM faults, FreeFault has almost no impact on performance. We explain how FreeFault adds an attractive layer in an overall resilience scheme of highly-reliable and highly-available systems by delaying, and even entirely avoiding, calling upon software to make tradeoff decisions between memory capacity, performance, and reliability.
引用
收藏
页码:439 / 450
页数:12
相关论文
共 62 条
  • [1] FAULT-TOLERANT DESIGN TECHNIQUES FOR SEMICONDUCTOR MEMORY APPLICATIONS
    AICHELMANN, FJ
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1984, 28 (02) : 177 - 183
  • [2] Albericio Jorge, 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Proceedings, P310, DOI 10.1145/2540708.2540735
  • [3] AMD Inc, 2013, BIOS KERN DEV GUID B
  • [4] [Anonymous], 2009, P INT C HIGH PERF CO
  • [5] [Anonymous], LLNLTR490254
  • [6] [Anonymous], INT 64 IA 32 ARCH SO
  • [7] [Anonymous], 2012, PROC IEEE INT C HIGH
  • [8] [Anonymous], 2006, SPEC CPU 2006
  • [9] [Anonymous], 1997, WHITE PAPER BENEFITS
  • [10] THE NAS PARALLEL BENCHMARKS
    BAILEY, DH
    BARSZCZ, E
    BARTON, JT
    BROWNING, DS
    CARTER, RL
    DAGUM, L
    FATOOHI, RA
    FREDERICKSON, PO
    LASINSKI, TA
    SCHREIBER, RS
    SIMON, HD
    VENKATAKRISHNAN, V
    WEERATUNGA, SK
    [J]. INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1991, 5 (03): : 63 - 73