Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks

被引：29

作者：

Wang, Xiaowei ^{[1
]}

Yu, Jiecao ^{[1
]}

Augustine, Charles ^{[2
]}

Iyer, Ravi ^{[2
]}

Das, Reetuparna ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Intel Corp, Santa Clara, CA 95051 USA

来源：

2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2019年

基金：

美国国家科学基金会;

关键词：

In-Memory Computing; Cache; Neural Network Pruning; Low Precision Neural Network;

D O I：

10.1109/HPCA.2019.00029

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Convolutional Neural Network (CNN) inference by leveraging network redundancy and massive parallelism. The network redundancy is exploited in two ways. First, we prune and fine-tune the trained network model and develop two distinct methods - coalescing and overlapping to run inferences efficiently with sparse models. Second, we propose an architecture for network models with a reduced bit width by leveraging bit-serial computation. Our proposed architecture achieves a 17.7x/3.7x speedup over server class CPU/GPU, and a 1.6x speedup compared to the relevant in-cache accelerator, with 2% area overhead each processor die, and no loss on top-1 accuracy for AlexNet. With a relaxed accuracy limit, our tunable architecture achieves higher speedups.

引用

页码：81 / 93

页数：13

共 50 条

[31] Fpar: filter pruning via attention and rank enhancement for deep convolutional neural networks acceleration
Chen, Yanming
Wu, Gang
Shuai, Mingrui
Lou, Shubin
Zhang, Yiwen
An, Zhulin
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2973 - 2985
[32] Bit Efficient Quantization for Deep Neural Networks
Nayak, Prateeth
Zhang, David
Chai, Sek
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
[33] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
Huang, Chenglong
Liu, Puguang
Fang, Liang
APPLIED INTELLIGENCE, 2021, 51 (07) : 4561 - 4574
[34] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
Chenglong Huang
Puguang Liu
Liang Fang
Applied Intelligence, 2021, 51 : 4561 - 4574
[35] Adversarial Robustness of Multi-bit Convolutional Neural Networks
Frickenstein, Lukas
Sampath, Shambhavi Balamuthu
Mori, Pierpaolo
Vemparala, Manoj-Rohit
Fasfous, Nael
Frickenstein, Alexander
Unger, Christian
Passerone, Claudio
Stechele, Walter
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 : 157 - 174
[36] Plug and Play Deep Convolutional Neural Networks
Neary, Patrick
Allan, Vicki
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 388 - 395
[37] An Efficient Accelerator for Deep Convolutional Neural Networks
Kuo, Yi-Xian
Lai, Yeong-Kang
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
[38] Elastography mapped by deep convolutional neural networks
LIU DongXu
KRUGGEL Frithjof
SUN LiZhi
Science China(Technological Sciences), 2021, (07) : 1567 - 1574
[39] Predicting enhancers with deep convolutional neural networks
Min, Xu
Zeng, Wanwen
Chen, Shengquan
Chen, Ning
Chen, Ting
Jiang, Rui
BMC BIOINFORMATICS, 2017, 18
[40] Metaphase finding with deep convolutional neural networks
Moazzen, Yaser
Capar, Abdulkerim
Albayrak, Abdulkadir
Calik, Nurullah
Toreyin, Behcet Ugur
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 52 : 353 - 361

← 1 2 3 4 5 →