Genetic Algorithm-Based Energy-Aware CNN Quantization for Processing-In-Memory Architecture

被引：10

作者：

Kang, Beomseok ^{[1
]}

Lu, Anni ^{[1
]}

Long, Yun ^{[2
]}

Kim, Daehyun ^{[1
]}

Yu, Shimeng ^{[1
]}

Mukhopadhyay, Saibal ^{[1
]}

机构：

[1] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA

[2] Google, Mountain View, CA 94043 USA

来源：

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS | 2021年 / 11卷 / 04期

基金：

美国国家科学基金会;

关键词：

Quantization; convolutional neural network; genetic algorithms; processing-in-memory;

D O I：

10.1109/JETCAS.2021.3127129

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We present a genetic algorithm based energy-aware convolutional neural network (CNN) quantization framework (EGQ) for processing-in-memory (PIM) architectures. EGQ predicts layer-wise dynamic energy consumption based on the number of ADC access. Also, EGQ automatically optimizes layer-wise weight/activation bitwidth that can reduce total dynamic energy with negligible accuracy loss. As EGQ requires basic CNN model information such as weight/activation dimensions to predict the dynamic energy, various models can be compressed by EGQ. We analyse the effectiveness of EGQ on the area, dynamic energy, and energy efficiency of PIM architectures for VGG-19, ResNet-18, and ResNet-50 using NeuroSim. We observe EGQ is an effective approach for the CNN models to reduce the dynamic energy in various PIM designs with SRAM, RRAM, and FeFET technologies. EGQ achieves 6.1 bit of average weight bitwidth and 6.3 bit of average activation bitwidth in ResNet-18, that improves energy efficiency by 6.5x than the 16-bit model. For ResNet-18 with CIFAR-10, 2.5 bit and 3.9 bit of average weight and activation bitwidth are achieved. Both results show the negligible accuracy loss of 2%.

引用

页码：649 / 662

页数：14

共 35 条

[1] [Anonymous], P 3 INT C LEARNING R
[2] Banner R., 2018, Post-training 4-bit quantization of convolution networks for rapid-deployment
[3] Low Bit-Width Convolutional Neural Network on RRAM
Cai, Yi
Tang, Tianqi
Xia, Lixue
Li, Boxun
Wang, Yu
Yang, Huazhong
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (07) : 1414 - 1427
[4] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Chi, Ping
Li, Shuangchen
Xu, Cong
Zhang, Tao
Zhao, Jishen
Liu, Yongpan
Wang, Yu
Xie, Yuan
[J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 27 - 39
[5] Low-bit Quantization of Neural Networks for Efficient Inference
Choukroun, Yoni
Kravchik, Eli
Yang, Fan
Kisilev, Pavel
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3009 - 3018
[6] Chuang P, 2019, PROC SYSML, P1
[7] Degrijse D., 2016, ARXIV160204354
[8] REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
Ding, Caiwen
Wang, Shuo
Liu, Ning
Xu, Kaidi
Wang, Yanzhi
Liang, Yun
[J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 33 - 42
[9] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Eckert, Charles
Wang, Xiaowei
Wang, Jingcheng
Subramaniyan, Arun
Iyer, Ravi
Sylvester, Dennis
Blaauw, David
Das, Reetuparna
[J]. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
[10] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778

← 1 2 3 4 →