Genetic Algorithm-Based Energy-Aware CNN Quantization for Processing-In-Memory Architecture

被引:10
作者
Kang, Beomseok [1 ]
Lu, Anni [1 ]
Long, Yun [2 ]
Kim, Daehyun [1 ]
Yu, Shimeng [1 ]
Mukhopadhyay, Saibal [1 ]
机构
[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Google, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Quantization; convolutional neural network; genetic algorithms; processing-in-memory;
D O I
10.1109/JETCAS.2021.3127129
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a genetic algorithm based energy-aware convolutional neural network (CNN) quantization framework (EGQ) for processing-in-memory (PIM) architectures. EGQ predicts layer-wise dynamic energy consumption based on the number of ADC access. Also, EGQ automatically optimizes layer-wise weight/activation bitwidth that can reduce total dynamic energy with negligible accuracy loss. As EGQ requires basic CNN model information such as weight/activation dimensions to predict the dynamic energy, various models can be compressed by EGQ. We analyse the effectiveness of EGQ on the area, dynamic energy, and energy efficiency of PIM architectures for VGG-19, ResNet-18, and ResNet-50 using NeuroSim. We observe EGQ is an effective approach for the CNN models to reduce the dynamic energy in various PIM designs with SRAM, RRAM, and FeFET technologies. EGQ achieves 6.1 bit of average weight bitwidth and 6.3 bit of average activation bitwidth in ResNet-18, that improves energy efficiency by 6.5x than the 16-bit model. For ResNet-18 with CIFAR-10, 2.5 bit and 3.9 bit of average weight and activation bitwidth are achieved. Both results show the negligible accuracy loss of 2%.
引用
收藏
页码:649 / 662
页数:14
相关论文
共 35 条
  • [1] [Anonymous], P 3 INT C LEARNING R
  • [2] Banner R., 2018, Post-training 4-bit quantization of convolution networks for rapid-deployment
  • [3] Low Bit-Width Convolutional Neural Network on RRAM
    Cai, Yi
    Tang, Tianqi
    Xia, Lixue
    Li, Boxun
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (07) : 1414 - 1427
  • [4] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
    Chi, Ping
    Li, Shuangchen
    Xu, Cong
    Zhang, Tao
    Zhao, Jishen
    Liu, Yongpan
    Wang, Yu
    Xie, Yuan
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 27 - 39
  • [5] Low-bit Quantization of Neural Networks for Efficient Inference
    Choukroun, Yoni
    Kravchik, Eli
    Yang, Fan
    Kisilev, Pavel
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3009 - 3018
  • [6] Chuang P, 2019, PROC SYSML, P1
  • [7] Degrijse D., 2016, ARXIV160204354
  • [8] REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
    Ding, Caiwen
    Wang, Shuo
    Liu, Ning
    Xu, Kaidi
    Wang, Yanzhi
    Liang, Yun
    [J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 33 - 42
  • [9] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
    Eckert, Charles
    Wang, Xiaowei
    Wang, Jingcheng
    Subramaniyan, Arun
    Iyer, Ravi
    Sylvester, Dennis
    Blaauw, David
    Das, Reetuparna
    [J]. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778