Increasing hardware data prefetching performance using the second-level cache

被引:0
作者
Drach, N [1 ]
Béchennec, JL [1 ]
Temam, O [1 ]
机构
[1] Paris S Univ, LRI, F-91405 Orsay, France
关键词
memory hierarchy; cache; superscalar processor; prefetching; bus traffic;
D O I
10.1016/S1383-7621(02)00122-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can severely disrupt processor behavior by delaying normal cache requests, inducing cache pollution and occupying the heavily used bus to the second-level cache. In this article, we show that applying hardware data prefetching to the second level cache exhibits most of the benefits of first-level cache prefetching with almost none of its drawbacks. Moreover, we outline that second-level hardware data prefetching is particularly well suited to out-of-order (OoO) processors because it can hide the long memory latencies due to second-level cache misses while OoO,execution of memory instructions can hide the lower latencies due to first-level cache misses that hit in the second-level cache. Finally, we show that when the full memory system is taken into account, especially bus traffic, first-level cache prefetching can actually degrade overall processor performance while second-level cache prefetching consistently improves overall performance. Our experimental results show that the instructions per cycle of floating-point programs (SPEC95) increases by 20% on a average using second-level cache hardware data prefetching while it decreases by 5% on a average using first-level cache hardware data prefetching. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:137 / 149
页数:13
相关论文
共 22 条
  • [1] [Anonymous], 1991, P ACM IEEE C SUP SUP
  • [2] BECHENNEC JL, 1998, WORKSH COMP ARCH ED
  • [3] Memory Bandwidth Limitations of Future Microprocessors
    Burger, D.
    Goodman, J. R.
    Kaegi, A.
    [J]. Computer Architecture News, 1996, 24 (02):
  • [4] CALLAHAN D, 1991, P ASPLOS, V4, P40
  • [5] Prefetching and memory system behavior of the SPEC95 benchmark suite
    Charney, MJ
    Puzak, TR
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1997, 41 (03) : 265 - 286
  • [6] EFFECTIVE HARDWARE-BASED DATA PREFETCHING FOR HIGH-PERFORMANCE PROCESSORS
    CHEN, TF
    BAER, JL
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (05) : 609 - 623
  • [7] Hardware identification of cache conflict misses
    Collins, JD
    Tullsen, DM
    [J]. 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, (MICRO-32), PROCEEDINGS, 1999, : 126 - 135
  • [8] DING C, 2000, P 14 PAR DISTR PROC
  • [9] DRACH N, 1995, P INT C SUP, P245
  • [10] SUPERSCALAR INSTRUCTION EXECUTION IN THE 21164-ALPHA MICROPROCESSOR
    EDMONDSON, JH
    RUBINFELD, P
    PRESTON, R
    RAJAGOPALAN, V
    [J]. IEEE MICRO, 1995, 15 (02) : 33 - 43