A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

被引：4

作者：

Valentin Jamet, Alexandre ^{[1
]}

Vavouliotis, Georgios ^{[2
]}

Jimenez, Daniel A. ^{[3
]}

Alvarez, Lluc ^{[1
]}

Casas, Marc ^{[1
]}

机构：

[1] Univ Politecn Catalunya UPC, Barcelona Supercomp Ctr BSC, Barcelona, Spain

[2] Huawei Zurich Res Ctr, Zurich, Switzerland

[3] Texas A&M Univ, College Stn, TX 77843 USA

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/HPCA57654.2024.00046

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, named First Level Predictor (FLP) and Second Level Predictor (SLP). FLP performs accurate off-chip prediction by using several program features based on virtual addresses and a novel selective delay component. The novelty of SLP relies on leveraging off-chip prediction to drive L1D prefetch filtering by using physical addresses and the FLP prediction as features. TLP constitutes the first hardware proposal targeting both off-chip prediction and prefetch filtering using a multi-level perceptron hardware approach. TLP only requires 7KB of storage. To demonstrate the benefits of TLP we compare its performance with state-of-the-art approaches using off-chip prediction and prefetch filtering on a wide range of single-core and multi-core workloads. Our experiments show that TLP reduces the average DRAM transactions by 30.7% and 17.7%, as compared to a baseline using state-of-the-art cache prefetchers but no off-chip prediction mechanism, across the single-core and multi-core workloads, respectively, while recent work significantly increases DRAM transactions. As a result, TLP achieves geometric mean performance speedups of 6.2% and 11.8% across single-core and multi-core workloads, respectively. In addition, our evaluation demonstrates that TLP is effective independently of the L1D prefetching logic.

引用

页码：528 / 542

页数：15

共 58 条

[41]

Qureshi MK, 2007, INT S HIGH PERF COMP, P250

[42]

Sembrant A, 2014, CONF PROC INT SYMP C, P133, DOI 10.1109/ISCA.2014.6853203

[43]

Seshadri V, 2012, INT CONFER PARA, P355

[44] Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks [J].

Seshadri, Vivek ;

Yedkar, Samihan ;

Xin, Hongyi ;

Mutlu, Onur ;

Gibbons, Phillip B. ;

Kozuch, Michael A. ;

Mowry, Todd C. .

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)

[45] Applying Deep Learning to the Cache Replacement Problem [J].

Shi, Zhan ;

Huang, Xiangru ;

Jain, Akanksha ;

Lin, Calvin .

MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, :413-425

[46]

Shiloach Yossi, 1980, Technical Report

[47] GraphR: Accelerating Graph Processing Using ReRAM [J].

Song, Linghao ;

Zhuo, Youwei ;

Qian, Xuehai ;

Li, Hai ;

Chen, Yiran .

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, :531-543

[48]

Teran E, 2016, INT SYMP MICROARCH

[49] Page Size Aware Cache Prefetching [J].

Vavouliotis, Georgios ;

Chacon, Gino ;

Alvarez, Lluc ;

Gratz, Paul V. ;

Jimenez, Daniel A. ;

Casas, Marc .

2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2022, :956-974

[50] Exploiting Page Table Locality for Agile TLB Prefetching [J].

Vavouliotis, Georgios ;

Alvarez, Lluc ;

Karakostas, Vasileios ;

Nikas, Konstantinos ;

Koziris, Nectarios ;

Jimenez, Daniel A. ;

Casas, Marc .

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :85-98

← 1 2 3 4 5 6 →