Utilization of cache area in on-chip multiprocessor

被引:0
作者
Oi, H [1 ]
Ranganathan, N
机构
[1] HAL Comp Syst Inc, Campbell, CA 95008 USA
[2] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
基金
美国国家科学基金会;
关键词
cache area; on-chip multiprocessor; memory latency; performance evaluation;
D O I
10.1016/S0141-9331(00)00094-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the off-chip and the on-chip memory access latencies is higher than traditional board-level implementation of the cache coherent non-uniform memory access (CC-NUMA) multiprocessors. We examine two options to utilize the cache resource of the on-chip multiprocessors whose size is restrained by the die area: (1) the instruction and/or private data are only cached at the L1 cache to leave more space on the L2 cache for the shared data; (2) divide cache area into the L2 and the remote victim caches or use all the area for the L2 cache. Results of execution-driven simulations show that the first option improved the performance up to 15%. For the second option, a remote victim cache with 1/8 of the L2 cache size improved three out of four benchmark programs by 4-8%. However, the combination of L2 and victim caches that divide the cache area into two halves of the same size was outperformed by the L2 cache occupying the entire cache area in three out of four benchmark programs. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:429 / 436
页数:8
相关论文
共 18 条
[1]  
[Anonymous], P 24 INT S COMP ARCH
[2]   Register file design considerations in dynamically scheduled processors [J].
Farkas, KI ;
Jouppi, NP ;
Chow, P .
SECOND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1996, :40-51
[3]   CACHE INVALIDATION PATTERNS IN SHARED-MEMORY MULTIPROCESSORS [J].
GUPTA, A ;
WEBER, WD .
IEEE TRANSACTIONS ON COMPUTERS, 1992, 41 (07) :794-810
[4]  
HAMMOND L, 1997, IEEE COMPUT, V30, P79, DOI DOI 10.1109/2.612253
[5]  
JOUPPI NP, 1993, 933 WRL DIG EQ CORP
[6]  
LUSK E, 1987, PORTABLE PROGRAMS PA
[7]   The effectiveness of SRAM network caches in clustered DSMs [J].
Moga, A ;
Dubois, M .
1998 FOURTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :103-112
[8]  
NAYFEH BA, 1997, P INT S HIGH PERF CO, P74
[9]   M32R/D - Integrating DRAM and microprocessor [J].
Nunomura, Y ;
Shimizu, T ;
Tomisawa, O .
IEEE MICRO, 1997, 17 (06) :40-48
[10]  
OI H, 1999, P INT S HIGH PERF CO, P373