机构:
Adv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Siddiqua, Taniya
[1
]
Sridharan, Vilas
论文数: 0引用数: 0
h-index: 0
机构:
Adv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Sridharan, Vilas
[1
]
Raasch, Steven E.
论文数: 0引用数: 0
h-index: 0
机构:
AMD Res, Boxboro, MA USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Raasch, Steven E.
[2
]
DeBardeleben, Nathan
论文数: 0引用数: 0
h-index: 0
机构:
Los Alamos Natl Lab, Ultrascale Syst Res Ctr, Los Alamos, NM USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
DeBardeleben, Nathan
[3
]
Ferreira, Kurt B.
论文数: 0引用数: 0
h-index: 0
机构:
Sandia Natl Labs, Ctr Comp Res, Livermore, CA 94550 USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Ferreira, Kurt B.
[4
]
Levy, Scott
论文数: 0引用数: 0
h-index: 0
机构:
Sandia Natl Labs, Ctr Comp Res, Livermore, CA 94550 USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Levy, Scott
[4
]
Baseman, Elisabeth
论文数: 0引用数: 0
h-index: 0
机构:
Los Alamos Natl Lab, Ultrascale Syst Res Ctr, Los Alamos, NM USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Baseman, Elisabeth
[3
]
Guan, Qiang
论文数: 0引用数: 0
h-index: 0
机构:
Los Alamos Natl Lab, Ultrascale Syst Res Ctr, Los Alamos, NM USAAdv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
Guan, Qiang
[3
]
机构:
[1] Adv Micro Devices Inc, RAS Architecture, Sunnyvale, CA 94088 USA
[2] AMD Res, Boxboro, MA USA
[3] Los Alamos Natl Lab, Ultrascale Syst Res Ctr, Los Alamos, NM USA
[4] Sandia Natl Labs, Ctr Comp Res, Livermore, CA 94550 USA
来源:
2017 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT)
|
2017年
关键词:
D O I:
暂无
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
In order to provide high system resilience, it is important to understand the nature of the faults that occur in the field. This study analyzes fault rates from a production system that has been monitored for five years, capturing data for the entire operational lifetime of the system. The data show that devices in this system did not show any sign of aging during the monitoring period, suggesting that the lifetime of a system may be longer than five years. In DRAM, the relative incidence of fault modes changed insignificantly over the system's lifetime: the relative rate of each fault mode at the end of the system's lifetime was within 1.4 percentage point of the rate observed during the first year. SRAM caches in the system exhibited different fault modes including cache-way fault and single-bit faults. Overall, this study provides insights on how fault modes and types in a system evolve over the system's lifetime.