TicToc: Enabling Bandwidth-Efficient DRAM Caching for both Hits and Misses in Hybrid Memory Systems

被引:9
|
作者
Young, Vinson [1 ]
Chishti, Zeshan A. [2 ]
Qureshi, Moinuddin K. [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Intel, Santa Clara, CA USA
来源
2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019) | 2019年
关键词
MAIN MEMORY; LATENCY;
D O I
10.1109/ICCD46524.2019.00055
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates bandwidth-efficient DRAM caching for hybrid DRAM + 3D-XPoint memories. 3D-XPoint is becoming a viable alternative to DRAM as it enables high-capacity and non-volatile main memory systems. However, 3D-XPoint has several characteristics that limit it from outright replacing DRAM: 4-8x slower read, and even worse writes. As such, effective DRAM caching in front of 3D-XPoint is important to enable a high-capacity, low-latency, and high-write-bandwidth memory. There are currently two major approaches for DRAM cache design: (1) a Tag-Inside-Cacheline (TIC) organization that optimizes for hits, by storing tag next to each line such that one access gets both tag and data, and (2) a Tag-Outside-Cacheline (TOC) organization that optimizes for misses, by storing tags from multiple data lines together in a tag-line such that one access to a tag-line gets information on several data-lines. Ideally, we would like to have the low hit-latency of TIC designs, and the low miss-bandwidth of TOC designs. To this end, we propose a TicToc organization that provisions both TIC and TOC to get the hit and miss benefits of both. We find that naively combining both techniques actually performs worse than TIC individually, because one has to pay the bandwidth cost of maintaining both metadata. The main contribution of this work is developing architectural techniques to reduce bandwidth cost of accessing and maintaining both TIC and TOC metadata. We find that most of the update bandwidth is due to maintaining the TOC dirty information. We propose a DRAM Cache Dirtiness Bit technique that carries DRAM cache dirty information to last-level caches, to help prune repeated dirty-bit updates for known dirty lines. We also propose a Preemptive Dirty Marking (PDM) technique that predicts which lines will be written and proactively marks the dirty bit at install time, to help avoid the initial dirty-bit update for dirty lines. To support PDM, we develop a novel PC-based Write-Predictor to aid in marking only write-likely lines. Our evaluations on a 4GB DRAM cache in front of 3D-XPoint show that our TicToc organization enables 10% speedup over the baseline TIC, nearing the 14% speedup possible with an idealized DRAM cache design with 64MB of SRAM tags, while needing only 34KB SRAM.
引用
收藏
页码:341 / 349
页数:9
相关论文
共 9 条
  • [1] Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation
    Yu, Xiangyao
    Hughes, Christopher J.
    Satish, Nadathur
    Mutlu, Onur
    Devadas, Srinivas
    50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 1 - 14
  • [2] A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
    Stehle, Elias
    Jacobsen, Hans-Arno
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 417 - 432
  • [3] Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth Memory
    Behnam, Payman
    Bojnordi, Mahdi Nazm
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (10) : 2675 - 2686
  • [4] SmartGW: Enabling Bandwidth-Efficient Group Watching in Cloud Social TV Systems
    Zheng Xue
    Di Wu
    Xueyan Xie
    Yonggang Wen
    Mobile Networks and Applications, 2015, 20 : 308 - 319
  • [5] SmartGW: Enabling Bandwidth-Efficient Group Watching in Cloud Social TV Systems
    Xue, Zheng
    Wu, Di
    Xie, Xueyan
    Wen, Yonggang
    MOBILE NETWORKS & APPLICATIONS, 2015, 20 (03): : 308 - 319
  • [6] Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
    Ro, Yuhwan
    Sung, Minchul
    Park, Yongjun
    Ahn, Jung Ho
    IEICE ELECTRONICS EXPRESS, 2017, 14 (11):
  • [7] Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems
    Ebrahimi, Eiman
    Mutlu, Onur
    Patt, Yale N.
    HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2009, : 7 - +
  • [8] Enabling Energy Efficient Hybrid Memory Cube Systems with Erasure Codes
    Wang, Shibo
    Song, Yanwei
    Bojnordi, Mahdi Nazm
    Ipek, Engin
    2015 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2015, : 67 - 72
  • [9] BF-Join: An Efficient Hash Join Algorithm for DRAM-NVM-Based Hybrid Memory Systems
    Yang, Liu
    Jin, Peiquan
    Wan, Shouhong
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 875 - 882