Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems

被引:11
作者
Chen, Hsing-Min [1 ,2 ]
Lee, Shin-Ying [2 ,3 ]
Mudge, Trevor [4 ]
Wu, Carole-Jean [2 ]
Chakrabarti, Chaitali [2 ]
机构
[1] Intel Corp, Santa Clara, CA 95054 USA
[2] Arizon State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85287 USA
[3] Samsung, Austin, TX 78754 USA
[4] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
3D DRAM; memory reliability; error control coding and GPU; ERROR; CODES; CACHE;
D O I
10.1109/TC.2018.2886884
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially for systems equipped with graphics processing units (GPUs). The design of ECC is challenging because future GPUs are expected to implement a memory subsystem supporting fine and coarse-grained data accesses to match the difference in the spatial locality of GPGPU applications. Current ECC designs, however, are developed for a fixed data fetch granularity. To have a more flexible design, we propose a novel memory protection scheme, called Config(urable)-ECC, which provides strong reliability for both fine and coarse-grained data accesses. Config-ECC consists of two tiers of ECC protection. The tier-1 code is a strong product code that can correct errors due to small granularity faults and detect errors caused by large granularity faults. The tier-2 code is an XOR-based code that is employed to correct errors incurred by large granularity faults. Config-ECC provides stronger reliability and/or lower energy consumption compared to state-of-the-art fixed 32B and 64B ECC schemes. It reduces the HBM energy by 17-21 percent while reducing the failure in time (FIT) rate by 20 times compared to a state-of-the-art fixed 64B ECC scheme with an insignificant 1.2 percent performance overhead.
引用
收藏
页码:646 / 659
页数:14
相关论文
共 49 条
[1]  
AMD, 2012, WHIT PAP AMD GRAPH C
[2]  
[Anonymous], P IEEE INT SOL STAT
[3]  
[Anonymous], 2016, WP08019001V011
[4]  
[Anonymous], 2009, NVIDIAS NEXT GEN CUD
[5]  
[Anonymous], 2015, P IEEE HOT CHIPS 27
[6]  
[Anonymous], 2012, PROC IEEE INT C HIGH
[7]  
[Anonymous], 2015, JESD235A JEDEC
[8]  
[Anonymous], 2014, NVIDIA GeForce GTX 750 Ti
[9]  
[Anonymous], 2011, CUDA C C SDK COD SAM
[10]  
[Anonymous], COMP ARCH INT PROC G