A high-bandwidth memory pipeline for wide issue processors

被引:7
作者
Cho, S [1 ]
Yew, PC
Lee, G
机构
[1] Samsung Elect Co, Media IP Grp, Yongin, Kyoung Ki, South Korea
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[3] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
data bandwidth; data locality; instruction level parallelism; runtime stack; data stream partitioning; multiported data cache;
D O I
10.1109/12.936237
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Providing adequate data bandwidth is extremely important for a future wide-issue processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can 'add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called data decoupling. This paper especially studies an interesting, yet less explored, behavior of memory access instructions, called access region locality, which is concerned with each static memory instruction and its range of access locations at runtime. Our experimental study using a set of SPEC95 benchmark programs shows that most memory access instructions reference a single region at runtime. Also shown is that it is possible to accurately predict the access region of a memory instruction at runtime by scrutinizing the addressing mode of the instruction and the past access history of it. We describe and evaluate a wide-issue superscalar processor with two distinct sets of memory pipelines and caches, driven by the access region predictor. Experimental results indicate that the proposed mechanism is very effective in providing high memory bandwidth to the processor, resulting in comparable or better performance than a conventional memory design with a heavily multiported data cache that can lead to much higher hardware complexity.
引用
收藏
页码:709 / 723
页数:15
相关论文
共 37 条
[1]  
AHO A, 1986, PRINCIPLES TECHNIQUE
[2]  
[Anonymous], 1342 U WISC COMP SCI
[3]  
Austin T. M., 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture (Cat. No.95TB100012), P82, DOI 10.1109/MICRO.1995.476815
[4]   Access region locality for high-bandwidth processor memory system design [J].
Cho, S ;
Yew, PC ;
Lee, G .
32ND ANNUAL INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, (MICRO-32), PROCEEDINGS, 1999, :136-146
[5]  
Cho SY, 1999, CONF PROC INT SYMP C, P100
[6]   Memory dependence prediction using store sets [J].
Chrysos, GZ ;
Emer, JS .
25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :142-153
[7]  
DITZEL D, 1982, P S ARCH SUPP PROGR, P48
[8]  
EDMONDSON JH, 1995, DIGITAL TECHNICAL J, V7
[9]  
EICKEMEYER RJ, 1993, IBM J RES DEV, V9
[10]  
FLYNN MJ, 1983, IEEE T COMPUT, V32, P156, DOI 10.1109/TC.1983.1676200