A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

被引:4
作者
Valentin Jamet, Alexandre [1 ]
Vavouliotis, Georgios [2 ]
Jimenez, Daniel A. [3 ]
Alvarez, Lluc [1 ]
Casas, Marc [1 ]
机构
[1] Univ Politecn Catalunya UPC, Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Huawei Zurich Res Ctr, Zurich, Switzerland
[3] Texas A&M Univ, College Stn, TX 77843 USA
来源
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年
基金
美国国家科学基金会;
关键词
D O I
10.1109/HPCA57654.2024.00046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, named First Level Predictor (FLP) and Second Level Predictor (SLP). FLP performs accurate off-chip prediction by using several program features based on virtual addresses and a novel selective delay component. The novelty of SLP relies on leveraging off-chip prediction to drive L1D prefetch filtering by using physical addresses and the FLP prediction as features. TLP constitutes the first hardware proposal targeting both off-chip prediction and prefetch filtering using a multi-level perceptron hardware approach. TLP only requires 7KB of storage. To demonstrate the benefits of TLP we compare its performance with state-of-the-art approaches using off-chip prediction and prefetch filtering on a wide range of single-core and multi-core workloads. Our experiments show that TLP reduces the average DRAM transactions by 30.7% and 17.7%, as compared to a baseline using state-of-the-art cache prefetchers but no off-chip prediction mechanism, across the single-core and multi-core workloads, respectively, while recent work significantly increases DRAM transactions. As a result, TLP achieves geometric mean performance speedups of 6.2% and 11.8% across single-core and multi-core workloads, respectively. In addition, our evaluation demonstrates that TLP is effective independently of the L1D prefetching logic.
引用
收藏
页码:528 / 542
页数:15
相关论文
共 58 条
[41]  
Qureshi MK, 2007, INT S HIGH PERF COMP, P250
[42]  
Sembrant A, 2014, CONF PROC INT SYMP C, P133, DOI 10.1109/ISCA.2014.6853203
[43]  
Seshadri V, 2012, INT CONFER PARA, P355
[44]   Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks [J].
Seshadri, Vivek ;
Yedkar, Samihan ;
Xin, Hongyi ;
Mutlu, Onur ;
Gibbons, Phillip B. ;
Kozuch, Michael A. ;
Mowry, Todd C. .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)
[45]   Applying Deep Learning to the Cache Replacement Problem [J].
Shi, Zhan ;
Huang, Xiangru ;
Jain, Akanksha ;
Lin, Calvin .
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, :413-425
[46]  
Shiloach Yossi, 1980, Technical Report
[47]   GraphR: Accelerating Graph Processing Using ReRAM [J].
Song, Linghao ;
Zhuo, Youwei ;
Qian, Xuehai ;
Li, Hai ;
Chen, Yiran .
2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, :531-543
[48]  
Teran E, 2016, INT SYMP MICROARCH
[49]   Page Size Aware Cache Prefetching [J].
Vavouliotis, Georgios ;
Chacon, Gino ;
Alvarez, Lluc ;
Gratz, Paul V. ;
Jimenez, Daniel A. ;
Casas, Marc .
2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2022, :956-974
[50]   Exploiting Page Table Locality for Agile TLB Prefetching [J].
Vavouliotis, Georgios ;
Alvarez, Lluc ;
Karakostas, Vasileios ;
Nikas, Konstantinos ;
Koziris, Nectarios ;
Jimenez, Daniel A. ;
Casas, Marc .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :85-98