Temporal Prefetching Without the Off-Chip Metadata

被引:39
作者
Wu, Hao [1 ]
Nathella, Krishnendra [2 ]
Pusdesris, Joseph [2 ]
Sunwoo, Dam [2 ]
Jain, Akanksha [1 ]
Lin, Calvin [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Arm Inc, Austin, TX USA
来源
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE | 2019年
关键词
Data prefetching; irregular temporal prefetching; caches; CPUs; SPATIAL LOCALITY; MEMORY;
D O I
10.1145/3352460.3358300
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal prefetching offers great potential, but this potential is difficult to achieve because of the need to store large amounts of prefetcher metadata off chip. To reduce the latency and traffic of off-chip metadata accesses, recent advances in temporal prefetching have proposed increasingly complex mechanisms that cache and prefetch this off-chip metadata. This paper suggests a return to simplicity: We present a temporal prefetcher whose metadata resides entirely on chip. The key insights are (1) only a small portion of prefetcher metadata is important, and (2) for most workloads with irregular accesses, the benefits of an effective prefetcher outweigh the marginal benefits of a larger data cache. Thus, our solution, the Triage prefetcher, identifies important metadata and uses a portion of the LLC to store this metadata, and it dynamically partitions the LLC between data and metadata. Our empirical results show that when compared against spatial prefetchers that use only on-chip metadata, Triage performs well, achieving speedups on irregular subset of SPEC2006 of 23.5% compared to 5.8% for the previous state-of-the-art. When compared against state-of-the-art temporal prefetchers that use off-chip metadata, Triage sacrifices performance on single-core systems (23.5% speedup vs. 34.7% speedup), but its 62% lower traffic overhead translates to better performance in bandwidth-constrained 16-core systems (6.2% speedup vs. 4.3% speedup).
引用
收藏
页码:996 / 1008
页数:13
相关论文
共 48 条
[1]  
[Anonymous], 2015, The 2nd data prefetching championship (dpc-2)
[2]  
[Anonymous], 2017, The 2nd cache replacement championship
[3]   Domino Temporal Data Prefetcher [J].
Bakhshalipour, Mohammad ;
Lotfi-Kamran, Pejman ;
Sarbazi-Azad, Hamid .
2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, :131-142
[4]  
Borkar S., 2011, EXASCALE CHALLENGE
[5]   Predictor virtualization [J].
Burcea, Ioana ;
Somogyi, Stephen ;
Moshovos, Andreas ;
Falsafi, Babak .
ACM SIGPLAN NOTICES, 2008, 43 (03) :157-167
[6]   Accurate and complexity-effective spatial pattern prediction [J].
Chen, CF ;
Yang, SH ;
Falsafi, B ;
Moshovos, A .
10TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2004, :276-287
[7]   EFFECTIVE HARDWARE-BASED DATA PREFETCHING FOR HIGH-PERFORMANCE PROCESSORS [J].
CHEN, TF ;
BAER, JL .
IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (05) :609-623
[8]   Efficient representations and abstractions for quantifying and exploiting data reference locality [J].
Chilimbi, TM .
ACM SIGPLAN NOTICES, 2001, 36 (05) :191-202
[9]   Low-cost epoch-based correlation prefetching for commercial applications [J].
Chou, Yuan .
MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, :301-313
[10]   Pointer cache assisted prefetching [J].
Collins, J ;
Sair, S ;
Calder, B ;
Tullsen, DM .
35TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-35), PROCEEDINGS, 2002, :62-73