A comparative analysis of cache designs for vector processing

被引:2
作者
Sun, T [1 ]
Yang, Q [1 ]
机构
[1] Univ Rhode Isl, Dept Elect & Comp Engn, Kingston, RI 02881 USA
基金
美国国家科学基金会;
关键词
performance evaluation; cache memories; memory hierarchy; vector processing; simulation; benchmarks;
D O I
10.1109/12.754999
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an experimental study on cache memory designs for vector computers. We use an execution-driven simulator to evaluate vector cache performance of a set of application programs from Perfect Club and SPEC92 benchmark suites. Our simulation results uncover a few important facts which were unknown before: First of all, the prime-mapped cache that we newly proposed shows great performance potential in vector processing environment. Because of its conflict-free property, the prime-mapped cache performs significantly better than conventional cache designs for all applications considered. Second, performance results on the benchmarks indicate that data locality in vector processing does exist, although the effects of line size, associativity replacement algorithm, and prefetching scheme on cache performance are very different from what has been commonly believed. A medium size vector cache (e.g., 128Kbytes) eliminates the necessity for a large number of interleaved memory banks in vector computers. Our experiments show that the vector computer that has a medium size prime-mapped cache with small cache line size and limited amount of prefetching provides significant speedup over conventional vector computers without cache. Performance results reported in this paper can also provide guidance to general-purpose computer designers to enhance cache performance for numerical applications.
引用
收藏
页码:331 / 344
页数:14
相关论文
共 17 条
[1]  
Abu-Sufah W., 1986, Proceedings of the 1986 International Conference on Parallel Processing (Cat. No.86CH2355-6), P559
[2]  
BAILEY DH, 1987, IEEE T COMPUT, V36, P293, DOI 10.1109/TC.1987.1676901
[3]  
BERRY M, 1989, INT J SUPERCOMPU FAL
[4]  
Bhandarkar D., 1990, Proceedings. The 17th Annual International Symposium on Computer Architecture (Cat. No.90CH2887-8), P204, DOI 10.1109/ISCA.1990.134527
[5]  
BUCHER IY, 1991, P SUP 91 NOV
[6]  
FU JWC, 1991, P 18 ANN INT S COMP, P54
[7]  
GANNON D, 1987, P INT C SUP
[8]   CACHE PERFORMANCE OF THE SPEC92 BENCHMARK SUITE [J].
GEE, JD ;
HILL, MD ;
PNEVMATIKATOS, DN ;
SMITH, AJ .
IEEE MICRO, 1993, 13 (04) :17-27
[9]  
HARPER DT, 1991, IEEE T PARALLEL DIST, V1
[10]   A CASE FOR DIRECT-MAPPED CACHES [J].
HILL, MD .
COMPUTER, 1988, 21 (12) :25-40