Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory

被引:1
|
作者
Hong, Jeongmin [1 ]
Cho, Sungjun [1 ]
Park, Geonwoo [1 ]
Yang, Wonhyuk [1 ]
Gong, Young-Ho [2 ]
Kim, Gwangsun [1 ]
机构
[1] POSTECH, Dept Comp Sci & Engn, Pohang Si, South Korea
[2] Soongsil Univ, Sch Software, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
PHASE-CHANGE MEMORY; HIGH-PERFORMANCE; MAIN MEMORY; ARCHITECTURE; EFFICIENT; SYSTEM;
D O I
10.1109/HPCA57654.2024.00021
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose overcoming the memory capacity limitation of GPUs with high-capacity Storage-Class Memory (SCM) and DRAM cache. By significantly increasing the memory capacity with SCM, the GPU can capture a larger fraction of the memory footprint than HBM for workloads that mandate memory oversubscription, resulting in substantial speedups. However, the DRAM cache needs to be carefully designed to address the latency and bandwidth limitations of the SCM while minimizing cost overhead and considering GPU's characteristics. Because the massive number of GPU threads can easily thrash the DRAM cache and degrade performance, we first propose an SCM-aware DRAM cache bypass policy for GPUs that considers the multidimensional characteristics of memory accesses by GPUs with SCM to bypass DRAM for data with low performance utility. In addition, to reduce DRAM cache probe traffic and increase effective DRAM BW with minimal cost overhead, we propose a Configurable Tag Cache (CTC) that repurposes part of the L2 cache to cache DRAM cacheline tags. The L2 capacity used for the CTC can be adjusted by users for adaptability. Furthermore, to minimize DRAM cache probe traffic from CTC misses, our Aggregated Metadata-In-Last-column (AMIL) DRAM cache organization co-locates all DRAM cacheline tags in a single column within a row. The AMIL also retains the full ECC protection, unlike prior DRAM cache implementation with Tag-And-Data (TAD) organization. Additionally, we propose SCM throttling to curtail power consumption and exploiting SCM's SLC/MLC modes to adapt to workload's memory footprint. While our techniques can be used for different DRAM and SCM devices, we focus on a Heterogeneous Memory Stack (HMS) organization that stacks SCM dies on top of DRAM dies for high performance. Compared to HBM, the HMS improves performance by up to 12.5x (2.9x overall) and reduces energy by up to 89.3% (48.1% overall). Compared to prior works, we reduce DRAM cache probe and SCM write traffic by 91-93% and 57-75%, respectively.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 31 条
  • [1] BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class Memory
    Heo, Jun
    Lee, Seung Yul
    Min, Sunhong
    Park, Yeonhong
    Jung, Sung Jun
    Ham, Tae Jun
    Lee, Jae W.
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 279 - 291
  • [2] Storage-class memory: The next storage system technology
    Freitas, R. F.
    Wilcke, W. W.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2008, 52 (4-5) : 439 - 447
  • [3] A Unified Access Manner for Storage-class Memory
    Tian, Yuchuan
    Wang, Fang
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 474 - 481
  • [4] Overview of candidate device technologies for storage-class memory
    Burr, G. W.
    Kurdi, B. N.
    Scott, J. C.
    Lam, C. H.
    Gopalakrishnan, K.
    Shenoy, R. S.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2008, 52 (4-5) : 449 - 464
  • [5] Powering-off DRAM with Aggressive Page-out to Storage-class Memory in Low Power Virtual Memory System
    Shirota, Yusuke
    Yoshimura, Shiyo
    Shirai, Satoshi
    Kanai, Tatsunori
    2016 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS XIX), 2016,
  • [6] Overview of candidate device technologies for storage-class memory
    Burr, Geoffrey W.
    Kurdi, Bülent N.
    Scott, J. Campbell
    Lam, Chung H.
    Gopalakrishnan, Kailash
    Shenoy, Rohit S.
    IBM Journal of Research and Development, 2008, 52 (4-5): : 449 - 464
  • [7] Phase-Change Memory-Towards a Storage-Class Memory
    Fong, Scott W.
    Neumann, Christopher M.
    Wong, H. -S. Philip
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2017, 64 (11) : 4374 - 4385
  • [8] Evolution of Phase-Change Memory for the Storage-Class Memory and Beyond
    Kim, Taehoon
    Lee, Seungyun
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2020, 67 (04) : 1394 - 1406
  • [9] Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
    Ro, Yuhwan
    Sung, Minchul
    Park, Yongjun
    Ahn, Jung Ho
    IEICE ELECTRONICS EXPRESS, 2017, 14 (11):
  • [10] Categorization of Multilevel-Cell Storage-Class Memory: An RRAM Example
    Liu, Jen-Chieh
    Hsu, Chung-Wei
    Wang, I-Ting
    Hou, Tuo-Hung
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2015, 62 (08) : 2510 - 2516