Multi-step LRU: SIMD-based Cache Replacement for Lower Overhead and Higher Precision

被引:2
作者
Inoue, Hiroshi [1 ]
机构
[1] IBM Res Tokyo, Tokyo, Japan
来源
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年
关键词
Cache replacement; LRU; SIMD;
D O I
10.1109/BigData52589.2021.9671363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key-value cache is a key component of many services to provide low-latency and high-throughput data accesses to a huge amount of data. To improve the end-to-end performance of such services, a key-value cache must achieve a high cache hit ratio with high throughput. In this paper, we propose a new cache replacement algorithm, multi-step LRU, which achieves high throughput by efficiently exploiting SIMD instructions without using per-item additional memory (LRU metadata) to record information such as the last access timestamp. For a small set of items that can fit within a vector register, SIMD-based LRU management without LRU metadata is known (in-vector LRU). It remembers the access history by reordering items in one vector using vector shuffle instruction. In-vector LRU alone cannot be used for a caching system since it can manage only few items. Set-associative cache is a straightforward way to build a large cache using in-vector LRU as a building block. However, a naive set-associative cache based on in-vector LRU has a poorer cache hit ratio than the original LRU although it can achieve a high throughput. Our multi-step LRU enhances naive set-associative cache based on in-vector LRU for improving cache accuracy by taking both access frequency and access recency of items into account while keeping the efficiency by SIMD instructions. Our results indicate that multi-step LRU outperforms the original LRU and GCLOCK algorithms in terms of both execution speed and cache hit ratio. Multi-step LRU improves the cache hit ratios over the original LRU by implicitly taking access frequency of items as well as access recency into account. The cache hit ratios of multi-step LRU are similar to those of ARC, which achieves a higher a cache hit ratio in a tradeoff for using more LRU metadata.
引用
收藏
页码:174 / 180
页数:7
相关论文
共 19 条
[1]  
[Anonymous], 2013, Computer Organization and Design MIPS Edition: The Hardware/Software Interface
[2]  
[Anonymous], Using Redis as an LRU cache
[3]  
Appleby A., MURMURHASH3
[4]  
Bansal S, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE 3RD USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, P187
[5]  
Breslau L, 1999, IEEE INFOCOM SER, P126, DOI 10.1109/INFCOM.1999.749260
[6]  
Cooper B. F., 2010, SOCC 10, P143, DOI DOI 10.1145/1807128.1807152
[7]  
Fan B., 2013, P 10 USENIX C NETW S, P371
[8]  
Inoue H., 2008, INF PROCESS SOC JPN, V1, P1
[9]  
Jiang S, 2005, USENIX Association Proceedings of the General Track: 2005 UNENIX Annual Technical Conference, P323
[10]  
Johnson T, 1994, P 20 INT C VER LARG, P439