TicToc: Enabling Bandwidth-Efficient DRAM Caching for both Hits and Misses in Hybrid Memory Systems

被引：9

作者：

Young, Vinson ^{[1
]}

Chishti, Zeshan A. ^{[2
]}

Qureshi, Moinuddin K. ^{[1
]}

机构：

[1] Georgia Inst Technol, Atlanta, GA 30332 USA

[2] Intel, Santa Clara, CA USA

来源：

2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019) | 2019年

关键词：

MAIN MEMORY; LATENCY;

D O I：

10.1109/ICCD46524.2019.00055

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper investigates bandwidth-efficient DRAM caching for hybrid DRAM + 3D-XPoint memories. 3D-XPoint is becoming a viable alternative to DRAM as it enables high-capacity and non-volatile main memory systems. However, 3D-XPoint has several characteristics that limit it from outright replacing DRAM: 4-8x slower read, and even worse writes. As such, effective DRAM caching in front of 3D-XPoint is important to enable a high-capacity, low-latency, and high-write-bandwidth memory. There are currently two major approaches for DRAM cache design: (1) a Tag-Inside-Cacheline (TIC) organization that optimizes for hits, by storing tag next to each line such that one access gets both tag and data, and (2) a Tag-Outside-Cacheline (TOC) organization that optimizes for misses, by storing tags from multiple data lines together in a tag-line such that one access to a tag-line gets information on several data-lines. Ideally, we would like to have the low hit-latency of TIC designs, and the low miss-bandwidth of TOC designs. To this end, we propose a TicToc organization that provisions both TIC and TOC to get the hit and miss benefits of both. We find that naively combining both techniques actually performs worse than TIC individually, because one has to pay the bandwidth cost of maintaining both metadata. The main contribution of this work is developing architectural techniques to reduce bandwidth cost of accessing and maintaining both TIC and TOC metadata. We find that most of the update bandwidth is due to maintaining the TOC dirty information. We propose a DRAM Cache Dirtiness Bit technique that carries DRAM cache dirty information to last-level caches, to help prune repeated dirty-bit updates for known dirty lines. We also propose a Preemptive Dirty Marking (PDM) technique that predicts which lines will be written and proactively marks the dirty bit at install time, to help avoid the initial dirty-bit update for dirty lines. To support PDM, we develop a novel PC-based Write-Predictor to aid in marking only write-likely lines. Our evaluations on a 4GB DRAM cache in front of 3D-XPoint show that our TicToc organization enables 10% speedup over the baseline TIC, nearing the 14% speedup possible with an idealized DRAM cache design with 64MB of SRAM tags, while needing only 34KB SRAM.

引用

页码：341 / 349

页数：9

共 9 条

[1] Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation
Yu, Xiangyao
Hughes, Christopher J.
Satish, Nadathur
Mutlu, Onur
Devadas, Srinivas
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 1 - 14
[2] A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
Stehle, Elias
Jacobsen, Hans-Arno
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 417 - 432
[3] Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth Memory
Behnam, Payman
Bojnordi, Mahdi Nazm
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (10) : 2675 - 2686
[4] SmartGW: Enabling Bandwidth-Efficient Group Watching in Cloud Social TV Systems
Zheng Xue
Di Wu
Xueyan Xie
Yonggang Wen
Mobile Networks and Applications, 2015, 20 : 308 - 319
[5] SmartGW: Enabling Bandwidth-Efficient Group Watching in Cloud Social TV Systems
Xue, Zheng
Wu, Di
Xie, Xueyan
Wen, Yonggang
MOBILE NETWORKS & APPLICATIONS, 2015, 20 (03): : 308 - 319
[6] Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
Ro, Yuhwan
Sung, Minchul
Park, Yongjun
Ahn, Jung Ho
IEICE ELECTRONICS EXPRESS, 2017, 14 (11):
[7] Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems
Ebrahimi, Eiman
Mutlu, Onur
Patt, Yale N.
HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2009, : 7 - +
[8] Enabling Energy Efficient Hybrid Memory Cube Systems with Erasure Codes
Wang, Shibo
Song, Yanwei
Bojnordi, Mahdi Nazm
Ipek, Engin
2015 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2015, : 67 - 72
[9] BF-Join: An Efficient Hash Join Algorithm for DRAM-NVM-Based Hybrid Memory Systems
Yang, Liu
Jin, Peiquan
Wan, Shouhong
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 875 - 882

← 1 →