Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures

被引：0

作者：

Siddaramu, Shanmukha Mangadahalli ^{[1
]}

Nezhadi, Ali ^{[1
]}

Mayahinia, Mahta ^{[1
]}

Ghasemi, Seyedehmaryam ^{[1
]}

Tahoori, Mehdi B. ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Dept Comp Sci, D-76131 Karlsruhe, Germany

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 11期

关键词：

Power demand; Nonvolatile memory; System performance; Systems architecture; Data processing; Software; Decoding; Sensors; Arrays; Optimization; Binary tree data structure; computation-in-memory (CiM); decoder; gem5; latch;

D O I：

10.1109/TCAD.2024.3447216

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder's row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.

引用

页码：3744 / 3755

页数：12

共 37 条

[1] X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
Agrawal, Amogh
Jaiswal, Akhilesh
Lee, Chankyu
Roy, Kaushik
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) : 4219 - 4232
[2] BASIC LOCAL ALIGNMENT SEARCH TOOL
ALTSCHUL, SF
GISH, W
MILLER, W
MYERS, EW
LIPMAN, DJ
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
[3] Beyond von Neumann
不详
[J]. NATURE NANOTECHNOLOGY, 2020, 15 (07) : 507 - 507
[4] [Anonymous], 2020, Data Sheet SN74HCS259-Q1
[5] Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718
[6] Devaux F., 2019, 2019 IEEE HOT CHIPS, P1, DOI DOI 10.1109/HOTCHIPS.2019.8875680
[7] NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
Dong, Xiangyu
Xu, Cong
Xie, Yuan
Jouppi, Norman P.
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (07) : 994 - 1007
[8] Github, Gem5 GitHub repository
[9] Testing Computation-in-Memory Architectures Based on Emerging Memories
Hamdioui, Said
Fieback, Moritz
Nagarajan, Surya
Taouil, Mottaqiallah
[J]. 2019 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2019,
[10] Herlihy Maurice, 2012, ART MULTIPROCESSOR P

← 1 2 3 4 →