Dyamond: Compact and Efficient 1T1C DRAM IMC Accelerator With Bit Column Addition for Memory-Intensive AI

被引:0
|
作者
Hong, Seongyon [1 ]
Jo, Wooyoung [1 ]
Kim, Sangjin [1 ]
Kim, Sangyeob [1 ]
Um, Soyeon [1 ]
Sohn, Kyomin [2 ]
Yoo, Hoi-Jun [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
[2] Samsung Elect, FLASH & DRAM Design Team, Memory Div, Hwaseong 18448, South Korea
关键词
Random access memory; Energy efficiency; Computer architecture; Artificial intelligence; Arrays; Single instruction multiple data; Computational efficiency; System-on-chip; In-memory computing; Accuracy; Artificial intelligence (AI); bit column addition (BCA) dataflow; compact MAC-SIMD (CMS) circuit; dynamic random access memory (DRAM); in-memory computing (IMC);
D O I
10.1109/JSSC.2025.3538899
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article proposes Dyamond, a one transistor, one capacitor (1T1C) dynamic random access memory (DRAM) in-memory computing (IMC) accelerator with architecture-to-circuit-level optimizations for high memory density and energy efficiency. The bit column addition (BCA) dataflow introduces output bit-wise accumulation to exploit varying accuracy and energy characteristics across different bit positions. The lower BCA (LBCA) reduces analog-to-digital converter (ADC) operations to enhance energy efficiency with inter-column analog accumulation. The higher BCA (HBCA) improves accuracy through signal enhancement and minimizes energy consumption per ADC readout with signal shift (SS). The design maximizes memory density by dedicating 1T1C cells solely to memory and integrating a compact computation circuit adjacent to the bitline sense amplifier. The memory access power is further reduced with a big-little array structure and a switchable sense amplifier (SWSA), which trades off retention time and energy consumption. Fabricated in 28-nm CMOS, Dyamond integrates 3.54-MB DRAM in a 6.48-mm2 area, achieving 27.2 TOPS/W peak efficiency and outstanding performance in advanced models such as BERT and GPT-2.
引用
收藏
页码:1299 / 1310
页数:12
相关论文
共 2 条
  • [1] An Energy Efficient Computing-in-Memory Accelerator With 1T2R Cell and Fully Analog Processing for Edge AI Applications
    Zhou, Keji
    Zhao, Chenyang
    Fang, Jinbei
    Jiang, Jingwen
    Chen, Deyang
    Huang, Yujie
    Jing, Minge
    Han, Jun
    Tian, Haidong
    Xiong, Xiankui
    Liu, Qi
    Xue, Xiaoyong
    Zeng, Xiaoyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (08) : 2932 - 2936
  • [2] A 1T2R1C ReRAM CIM Accelerator With Energy-Efficient Voltage Division and Capacitive Coupling for CNN Acceleration in AI Edge Applications
    Chen, Deyang
    Guo, Zhiwang
    Fang, Jinbei
    Zhao, Chenyang
    Jiang, Jingwen
    Zhou, Keji
    Tian, Haidong
    Xiong, Xiankui
    Xue, Xiaoyong
    Zeng, Xiaoyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (01) : 276 - 280