Exploiting Reuse Locality on Inclusive Shared Last-Level Caches

被引:15
作者
Albericio, Jorge [1 ]
Ibanez, Pablo [1 ]
Vinals, Victor [1 ]
Maria Llaberia, Jose [2 ]
机构
[1] Univ Zaragoza, Dpto Ing Sistemas & Informat, E-50009 Zaragoza, Spain
[2] UPC Barcelona Tech, DAC, Barcelona, Spain
关键词
Design; Performance; Replacement policy; shared resources management; PREDICTION; BEHAVIOR;
D O I
10.1145/2400682.2400697
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is only slightly exhibited by the stream of references arriving at the SLLC. Thus, traditional replacement algorithms based on recency are bad choices for governing SLLC replacement. Recent proposals involve SLLC replacement policies that attempt to exploit reuse either by segmenting the replacement list or improving the rereference interval prediction. On the other hand, inclusive SLLCs are commonplace in the CMP market, but the interaction between replacement policy and the enforcement of inclusion has barely been discussed. After analyzing that interaction, this article introduces two simple replacement policies exploiting reuse locality and targeting inclusive SLLCs: Least Recently Reused (LRR) and Not Recently Reused (NRR). NRR has the same implementation cost as NRU, and LRR only adds one bit per line to the LRU cost. After considering reuse locality and its interaction with the invalidations induced by inclusion, the proposals are evaluated by simulating multiprogrammed workloads in an 8-core system with two private cache levels and an SLLC. LRR outperforms LRU by 4.5% (performing better in 97 out of 100 mixes) and NRR outperforms NRU by 4.2% (performing better in 99 out of 100 mixes). We also show that our mechanisms outperform rereference interval prediction, a recently proposed SLLC replacement policy and that similar conclusions can be drawn by varying the associativity or the SLLC size.
引用
收藏
页数:19
相关论文
共 22 条
[1]  
Chen XF, 2006, PROCEEDINGS OF FORMAL METHODS IN COMPUTER AIDED DESIGN, P81
[2]  
Gao H., 2010, JWAC 2010 - 1st JILP Worshop on Computer Architecture Competitions: cache replacement Championship
[3]  
INTEL, 2011, INT COR I7 PROC
[4]  
Jaleel A., 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P151, DOI 10.1109/MICRO.2010.52
[5]   High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP) [J].
Jaleel, Aamer ;
Theobald, Kevin B. ;
Steely, Simon C., Jr. ;
Emer, Joel .
ISCA 2010: THE 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2010, :60-71
[6]   Adaptive Insertion Policies for Managing Shared Caches [J].
Jaleel, Aamer ;
Hasenplaugh, William ;
Qureshi, Moinuddin ;
Sebot, Julien ;
Steely, Simon, Jr. ;
Emer, Joel .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :208-219
[7]  
KAHN S., 2012, P IEEE 18 INT S HIGH
[8]   CACHING STRATEGIES TO IMPROVE DISK SYSTEM PERFORMANCE [J].
KAREDLA, R ;
LOVE, JS ;
WHERRY, BG .
COMPUTER, 1994, 27 (03) :38-46
[9]  
Kaxiras S, 2001, ACM COMP AR, P240, DOI 10.1109/ISCA.2001.937453
[10]  
Khan S. M., 2010, Proceedings 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010), P175, DOI 10.1109/MICRO.2010.24