Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures

被引:0
作者
Siddaramu, Shanmukha Mangadahalli [1 ]
Nezhadi, Ali [1 ]
Mayahinia, Mahta [1 ]
Ghasemi, Seyedehmaryam [1 ]
Tahoori, Mehdi B. [1 ]
机构
[1] Karlsruhe Inst Technol, Dept Comp Sci, D-76131 Karlsruhe, Germany
关键词
Power demand; Nonvolatile memory; System performance; Systems architecture; Data processing; Software; Decoding; Sensors; Arrays; Optimization; Binary tree data structure; computation-in-memory (CiM); decoder; gem5; latch;
D O I
10.1109/TCAD.2024.3447216
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder's row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.
引用
收藏
页码:3744 / 3755
页数:12
相关论文
共 37 条
  • [1] X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
    Agrawal, Amogh
    Jaiswal, Akhilesh
    Lee, Chankyu
    Roy, Kaushik
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) : 4219 - 4232
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Beyond von Neumann
    不详
    [J]. NATURE NANOTECHNOLOGY, 2020, 15 (07) : 507 - 507
  • [4] [Anonymous], 2020, Data Sheet SN74HCS259-Q1
  • [5] Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718
  • [6] Devaux F., 2019, 2019 IEEE HOT CHIPS, P1, DOI DOI 10.1109/HOTCHIPS.2019.8875680
  • [7] NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
    Dong, Xiangyu
    Xu, Cong
    Xie, Yuan
    Jouppi, Norman P.
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (07) : 994 - 1007
  • [8] Github, Gem5 GitHub repository
  • [9] Testing Computation-in-Memory Architectures Based on Emerging Memories
    Hamdioui, Said
    Fieback, Moritz
    Nagarajan, Surya
    Taouil, Mottaqiallah
    [J]. 2019 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2019,
  • [10] Herlihy Maurice, 2012, ART MULTIPROCESSOR P