Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks

被引：29

作者：

Wang, Xiaowei ^{[1
]}

Yu, Jiecao ^{[1
]}

Augustine, Charles ^{[2
]}

Iyer, Ravi ^{[2
]}

Das, Reetuparna ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Intel Corp, Santa Clara, CA 95051 USA

来源：

2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2019年

基金：

美国国家科学基金会;

关键词：

In-Memory Computing; Cache; Neural Network Pruning; Low Precision Neural Network;

D O I：

10.1109/HPCA.2019.00029

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods - coalescing and overlapping to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7x/3.7x speedup over server class CPU/GPU, and a 1.6x speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups.

引用

页码：81 / 93

页数：13

共 50 条

[1] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Eckert, Charles
Wang, Xiaowei
Wang, Jingcheng
Subramaniyan, Arun
Iyer, Ravi
Sylvester, Dennis
Blaauw, David
Das, Reetuparna
IEEE MICRO, 2019, 39 (03) : 11 - 19
[2] Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Eckert, Charles
Wang, Xiaowei
Wang, Jingcheng
Subramaniyan, Arun
Iyer, Ravi
Sylvester, Dennis
Blaauw, David
Das, Reetuparna
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 383 - 396
[3] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
Gong, Cheng
Lu, Ye
Xie, Kunpeng
Jin, Zongming
Li, Tao
Wang, Yanzhi
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
[4] PROCESSING CONVOLUTIONAL NEURAL NETWORKS ON CACHE
Vieira, Joao
Roma, Nuno
Falcao, Gabriel
Tomas, Pedro
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1658 - 1662
[5] Acceleration of Deep Convolutional Neural Networks Using Adaptive Filter Pruning
Singh, Pravendra
Verma, Vinay Kumar
Rai, Piyush
Namboodiri, Vinay P.
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (04) : 838 - 847
[6] Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
He, Yang
Ding, Yuhang
Liu, Ping
Zhu, Linchao
Zhang, Hanwang
Yang, Yi
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2006 - 2015
[7] Towards Acceleration of Deep Convolutional Neural Networks using Stochastic Computing
Li, Ji
Ren, Ao
Li, Zhe
Ding, Caiwen
Yuan, Bo
Qiu, Qinru
Wang, Yanzhi
2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 115 - 120
[8] Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
Zhan, Chen
Fang, Zhenman
Zhou, Peipei
Pan, Peichen
Cong, Jason
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
[9] Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
Zhang, Chen
Sun, Guangyu
Fang, Zhenman
Zhou, Peipei
Pan, Peichen
Cong, Jason
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (11) : 2072 - 2085
[10] Filter pruning via annealing decaying for deep convolutional neural networks acceleration
Huang, Jiawen
Xiong, Liyan
Huang, Xiaohui
Chen, Qingsen
Huang, Peng
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (02):

← 1 2 3 4 5 →