Hardware Efficient Convolution Processing Unit for Deep Neural Networks

被引:0
作者
Hazarika, Anakhi [1 ]
Poddar, Soumyajit [1 ]
Rahaman, Hafizur [2 ]
机构
[1] Indian Inst Informat Technol Guwahati, Gauhati 781015, India
[2] Indian Inst Engn Sci & Technol, Sibpur 711103, Howrah, India
来源
2019 2ND INTERNATIONAL SYMPOSIUM ON DEVICES, CIRCUITS AND SYSTEMS (ISDCS 2019) | 2019年
关键词
Deep Neural Network; CNN Hardware Accelerator; Field Programmable Gate Array (FPGA);
D O I
10.1109/isdcs.2019.8719278
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Network (CNN) is a type of deep neural networks that are commonly used for object detection and classification. State-of-the-art hardware for training and inference of CNN architectures require a considerable amount of computation and memory intensive resources. CNN achieves greater accuracy at the cost of high computational complexity and large power consumption. To optimize the memory requirement, processing speed and power, it is crucial to design more efficient accelerator architecture for CNN computation. In this work, an overlap of spatially adjacent data is exploited in order to parallelize the movement of data. A fast, re-configurable hardware accelerator architecture along with optimized kernel design suitable for a variety of CNN models is proposed. Our design achieves 2.1x computational benefits over state-of-the-art accelerator architectures.
引用
收藏
页数:4
相关论文
共 12 条
[1]  
[Anonymous], P 3 INT C LEARNING R
[2]  
[Anonymous], PROC CVPR IEEE
[3]  
[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
[4]  
[Anonymous], CIRC SYST ISCAS 2017
[5]  
[Anonymous], 2016, 49 ANN IEEE ACM INT
[6]  
[Anonymous], 2016, ARXIV160207360
[7]  
[Anonymous], ADV NEURAL INFORM PR
[8]   A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA [J].
Bacis, Marco ;
Natale, Giuseppe ;
Del Sozzo, Emanuele ;
Santambrogio, Marco Domenico .
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, :90-97
[9]   Origami: A 803-GOp/s/W Convolutional Network Accelerator [J].
Cavigelli, Lukas ;
Benini, Luca .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (11) :2461-2475
[10]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138