A cache error propagation model

被引:12
作者
Somani, AK [1 ]
Trivedi, KS [1 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
来源
PACIFIC RIM INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT SYSTEMS, PROCEEDINGS | 1997年
关键词
cache memory system; cache error propagation; cache error recovery; fault injection; latent faults in cache memory systems; Markov models;
D O I
10.1109/PRFTS.1997.640119
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as processor allows a short duration for read and write. The fault may corrupt the cache memory system or lead to an erroneous internal CPU state. In this paper we investigate the error propagation in cache memory system due to transient faults either in the cache memory itself or in the processor's registers or both. The information gained from such an investigation should lead to the development of more effective error recovery mechanisms against failures due to transient faults arising in the machine's cache memory and register set. We establish that even though the computer system is capable of recovering about 50% of the time from the effect of a single erroneous cache location/processor register; the other 50% of the time error recovery is affected only through specific recovery mechanisms. Our results are obtained using both a discrete-time Markov model and by means of error injection on a real system.
引用
收藏
页码:15 / 21
页数:7
相关论文
empty
未找到相关数据