Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training

被引:1
|
作者
Han, Wontak [1 ]
Heo, Jaehoon [1 ]
Kim, Junsoo [1 ]
Lim, Sukbin [1 ]
Kim, Joo-Young [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 34141, South Korea
关键词
Training; Computational modeling; Computer architecture; Deep learning; Circuits and systems; Power demand; Neurons; Accelerator architecture; machine learning; processing-in-memory architecture; bit-serial operation; inference; training; sparsity handling; SRAM; energy-efficient architecture; DEEP NEURAL-NETWORKS; SRAM; ACCELERATOR; MACRO;
D O I
10.1109/JETCAS.2022.3168852
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As machine learning (ML) and artificial intelligence (AI) have become mainstream technologies, many accelerators have been proposed to cope with their computation kernels. However, they access the external memory frequently due to the large size of deep neural network model, suffering from the von Neumann bottleneck. Moreover, as privacy issue is becoming more critical, on-device training is emerging as its solution. However, on-device training is challenging because it should perform the training under a limited power budget, which requires a lot more computations and memory accesses than the inference. In this paper, we present an energy-efficient processing-in-memory (PIM) architecture supporting end-to-end on-device training named T-PIM. Its macro design includes an 8T-SRAM cell-based PIM block to compute in-memory AND operation and three computational datapaths for end-to-end training. Each of three computational paths integrates arithmetic units for forward propagation, backward propagation, and gradient calculation and weight update, respectively, allowing the weight data stored in the memory stationary. T-PIM also supports variable bit precision to cover various ML scenarios. It can use fully variable input bit precision and 2-bit, 4-bit, 8-bit, and 16-bit weight bit precision for the forward propagation and the same input bit precision and 16-bit weight bit precision for the backward propagation. In addition, T-PIM implements sparsity handling schemes that skip the computation for input data and turn off the arithmetic units for weight data to reduce both unnecessary computations and leakage power. Finally, we fabricate the T-PIM chip on a 5.04mm(2) die in a 28-nm CMOS logic process. It operates at 50-280MHz with the supply voltage of 0.75-1.05V, dissipating 5.25-51.23mW power in inference and 6.10-37.75mW in training. As a result, it achieves 17.90-161.08TOPS/W energy efficiency for the inference of 1-bit activation and 2-bit weight data, and 0.84-7.59TOPS/W for the training of 8-bit activation/error and 16-bit weight data. In conclusion, T-PIM is the first PIM chip that supports end-to-end training, demonstrating 2.02 times performance improvement over the latest PIM that partially supports training.
引用
收藏
页码:354 / 366
页数:13
相关论文
共 31 条
  • [21] PIPECIM: Energy-Efficient Pipelined Computing-in-Memory Computation Engine With Sparsity-Aware Technique
    Wang, Yuanbo
    Chang, Liang
    Wang, Jingke
    Zhao, Pan
    Zeng, Jiahao
    Zhao, Xin
    Hao, Wuyang
    Zhou, Liang
    Tan, Haining
    Han, Yinhe
    Zhou, Jun
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2025, 33 (02) : 525 - 536
  • [22] A Design Framework of Heterogeneous Approximate DCIM-Based Accelerator for Energy-Efficient NN Processing
    Lee, Kyeongho
    Lee, Hyeyeong
    Park, Jongsun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025,
  • [23] Design of an Energy-Efficient Accelerator for Training of Convolutional Neural Networks using Frequency-Domain Computation
    Ko, Jong Hwan
    Mudassar, Burhan
    Na, Taesik
    Mukhopadhyay, Saibal
    PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [24] An Area- and Energy-Efficient Spiking Neural Network With Spike-Time-Dependent Plasticity Realized With SRAM Processing-in-Memory Macro and On-Chip Unsupervised Learning
    Liu, Shuang
    Wang, J. J.
    Zhou, J. T.
    Hu, S. G.
    Yu, Q.
    Chen, T. P.
    Liu, Y.
    IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2023, 17 (01) : 92 - 104
  • [25] A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing
    Huang, Yu
    Zheng, Long
    Yao, Pengcheng
    Zhao, Jieshan
    Liao, Xiaofei
    Jin, Hai
    Xue, Jingling
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 684 - 695
  • [26] VLSI Architecture for Energy-Efficient and Accurate Pre-Processing Pan-Tompkins Design
    Ribeiro, Leo
    da Costa, Patricia
    Paim, Guilherme
    da Costa, Eduardo A.
    de Almeida, Sergio Jose Melo
    Bampi, Sergio
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (11) : 4768 - 4772
  • [27] Energy-Efficient High-Speed ASIC Implementation of Convolutional Neural Network Using Novel Reduced Critical-Path Design
    Lee, Sun Sik
    Nguyen, Thanh Dat
    Meher, Pramod Kumar
    Park, Sang Yoon
    IEEE ACCESS, 2022, 10 : 34032 - 34045
  • [28] PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing
    Wang, Yang
    Deng, Dazheng
    Liu, Leibo
    Wei, Shaojun
    Yin, Shouyi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (10) : 4042 - 4055
  • [29] Energy-Efficient In-Memory Binary Neural Network Accelerator Design Based on 8T2C SRAM Cell
    Oh, Hyunmyung
    Kim, Hyungjun
    Ahn, Daehyun
    Park, Jihoon
    Kim, Yulhwa
    Lee, Inhwan
    Kim, Jae-Joon
    IEEE SOLID-STATE CIRCUITS LETTERS, 2022, 5 : 70 - 73
  • [30] Skew-CIM: Process-Variation-Resilient and Energy-Efficient Computation-in-Memory Design Technique With Skewed Weights
    Yi, Donghyeon
    Lee, Seoyoung
    Choi, Injun
    Yun, Gichan
    Choi, Edward Jongyoon
    Park, Jonghee
    Kwak, Jonghoon
    Jang, Sung-Joon
    Ha, Sohmyung
    Chang, Ik-Joon
    Je, Minkyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (05) : 2067 - 2078