CPU cache prefetching: Timing evaluation of hardware implementations

被引:20
作者
Tse, J
Smith, AJ
机构
[1] Altera Corp, El Cerrito, CA 94530 USA
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Div Comp Sci, Berkeley, CA 94720 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
cache memory; prefetching; timing model; cache prefetching; CPU architecture; memory system design; CPU cache memory;
D O I
10.1109/12.677225
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but known implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with normal cache operations by making cache address and data ports busy, the memory bus busy, the memory banks busy, and by not necessarily being complete by the time that the prefetched data is actually referenced. In this paper, we present extensive quantitative results of a detailed cycle-by-cycle trace-driven simulation of a uniprocessor memory system in which we vary most of the relevant parameters in order to determine when and if hardware prefetching is useful. We find that, in order for prefetching to actually improve performance, the address array needs to be double ported and the data array needs to either be double ported or fully buffered. It is also very helpful for the bus to be very wide (e.g., 16 bytes) for bus transactions to be split and for main memory to be interleaved. Under the best circumstances, i.e., with a significant investment in extra hardware, prefetching can significantly improve performance. For implementations without adequate hardware, prefetching often decreases performance.
引用
收藏
页码:509 / 526
页数:18
相关论文
共 36 条
[1]  
BAER JL, 1988, P 15 INT S COMP ARCH, P73
[2]  
CALLAHAN, 1991, P 4 INT C ARCH SUPP, P40
[3]   EFFECTIVE HARDWARE-BASED DATA PREFETCHING FOR HIGH-PERFORMANCE PROCESSORS [J].
CHEN, TF ;
BAER, JL .
IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (05) :609-623
[4]  
CHI CH, 1994, P 199J INT C PAR PRO, V1, P263
[5]  
CHO J, 1986, UCBCSD86289
[6]   SEQUENTIAL HARDWARE PREFETCHING IN SHARED-MEMORY MULTIPROCESSORS [J].
DAHLGREN, F ;
DUBOIS, M ;
STENSTROM, P .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (07) :733-746
[7]  
DAHLGREN F, 1993, P 1993 INT C PAR PRO, P156
[8]  
Fu J. W. C., 1991, Proceedings. The Fifth International Parallel Processing Symposium (Cat. No.91TH0363-2), P555, DOI 10.1109/IPPS.1991.153836
[9]   CACHE PERFORMANCE OF THE SPEC92 BENCHMARK SUITE [J].
GEE, JD ;
HILL, MD ;
PNEVMATIKATOS, DN ;
SMITH, AJ .
IEEE MICRO, 1993, 13 (04) :17-27
[10]  
GEE JG, 1996, P MASCOTS 96 4 INT W, P236