Exploring Bit-Level Sparsity for Partial Sum Quantization in Computing-In-Memory Accelerator

被引:1
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing, Peoples R China
来源
2023 IEEE 12TH NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM, NVMSA | 2023年
关键词
Computing-In-Memory (CIM); partial sum quantization (PSQ); bit-level sparsity; post-training quantization (PTQ);
D O I
10.1109/NVMSA58981.2023.00021
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Computing-In-Memory (CIM) has demonstrated great potential in boosting the performance and energy efficiency of convolutional neural networks. However, due to the limited size and precision of its memory array, the input and weight matrices of convolution operations have to be split into sub-matrices or even binary sub-matrices, especially when using bit-slicing and single-level cells (SLCs). A large number of partial sums are generated as a result. To maintain high computing precision, high-resolution analog-to-digital converters (ADCs) are used to obtain partial sums at the cost of considerable area and substantial energy overhead. Partial sum quantization (PSQ), a technique that can greatly reduce the resolution of ADC, remains sparsely studied. This paper proposes a novel PSQ approach for CIM-based accelerators by exploring the bit-level sparsity of neural networks. Then, to find the optimal clipping threshold for ADCs, a reparametrized clipping function is also proposed. Finally, we develop a general post-training quantization framework for the PSQ-CIM. Experiments on a variety of neural networks and datasets show that, in typical case (ResNet18 for ImageNet), the required resolution of ADC can be reduced to 2 bits with little accuracy loss (similar to 0.92%) and the hardware efficiency can be improved by 199.7%.
引用
收藏
页码:32 / 37
页数:6
相关论文
共 20 条
  • [1] Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators
    Azamat, Azat
    Asim, Faaiz
    Lee, Jongeun
    [J]. 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [2] Low Bit-Width Convolutional Neural Network on RRAM
    Cai, Yi
    Tang, Tianqi
    Xia, Lixue
    Li, Boxun
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (07) : 1414 - 1427
  • [3] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
    Chi, Ping
    Li, Shuangchen
    Xu, Cong
    Zhang, Tao
    Zhao, Jishen
    Liu, Yongpan
    Wang, Yu
    Xie, Yuan
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 27 - 39
  • [4] Han S., 2015, ARXIV151000149
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision
    Imani, Mohsen
    Gupta, Saransh
    Kim, Yeseong
    Rosing, Tajana
    [J]. PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 802 - 815
  • [7] Kang Myeonggu., 2021, IEEE Transactions on Computers, P1
  • [8] Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks
    Kim, Ji-Hoon
    Lee, Juhyoung
    Lee, Jinsu
    Heo, Jaehoon
    Kim, Joo-Young
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (04) : 1093 - 1104
  • [9] Li Y., 2021, P INT C LEARN REPR, P1
  • [10] Learning Efficient Convolutional Networks through Network Slimming
    Liu, Zhuang
    Li, Jianguo
    Shen, Zhiqiang
    Huang, Gao
    Yan, Shoumeng
    Zhang, Changshui
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2755 - 2763