CHERI-picking: Leveraging capability hardware for prefetching

被引:2
作者
Patel, Shaurya [1 ]
Agrawal, Sidharth [1 ]
Fedorova, Alexandra [1 ]
Seltzer, Margo [1 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
来源
PROCEEDINGS OF THE 12TH WORKSHOP ON PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, PLOS 2023 | 2023年
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1145/3623759.3624553
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DRAM now accounts for over 30% of overall datacenter expense [30], due to its increasing cost and decreasing scaling. [19, 22]. As applications demand more memory, operators look for cost-effective solutions to handle these increasing requirements. One way to address the problem is to use disaggregated or far memory [18, 23, 25, 30]. Far memory solutions have an access latency approximately an order of magnitude slower than DRAM, thus, accurate memory page prefetching is critical. Important applications show pointer-chasing behavior, and existing prefetchers struggle to effectively predict these patterns. We find that 35-78% of page faults for benchmarks we analyzed are due to pointer accesses, but the default kernel prefetcher is ineffective for these patterns. We introduce a new generalized kernel pointer prefetcher using CHERI: Capability Hardware Enhanced RISC Instructions [32]. Our approach, called CHERI-picking, leverages CHERI pointer capabilities to identify locations that contain pointers and prefetch the pages those pointers reference, subject to a policy. CHERI-picking does not require changes to applications, profiling, or offline analysis. We implement CHERI-picking in CheriBSD and evaluate it using benchmarks. Our results show that CHERI-picking is effective where traditional kernel prefetchers are not, indicating the promise of this approach. We also show the overheads of discovering pointers and discuss blocking faults (faults that are prefetched but still in transit when the application accesses them) that currently stand in the way of adopting CHERI-picking. We discuss potential avenues to address these challenges.
引用
收藏
页码:58 / 65
页数:8
相关论文
共 34 条
[1]  
Al Maruf H, 2020, PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, P843
[2]   Can Far Memory Improve Job Throughput? [J].
Amaro, Emmanuel ;
Branner-Augmon, Christopher ;
Luo, Zhihong ;
Ousterhout, Amy ;
Aguilera, Marcos K. ;
Panda, Aurojit ;
Ratnasamy, Sylvia ;
Shenker, Scott .
PROCEEDINGS OF THE FIFTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS'20), 2020,
[3]  
[Anonymous], 2000, Linux perf probe
[4]  
[Anonymous], 2008, zswap-The Linux Kernel documentation
[5]  
ARM, 2022, Morello Program-ARM
[6]   Classifying Memory Access Patterns for Prefetching [J].
Ayers, Grant ;
Litz, Heiner ;
Kozyrakis, Christos ;
Ranganathan, Parthasarathy .
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, :513-526
[7]  
Beamer S., 2015, arXiv
[8]  
Bienia C., 2011, Benchmarking modern multiprocessors
[9]  
CheriBSD, 2023, CheriBSD prefetcher
[10]  
CheriBSD, 2023, Swap pager