Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

被引:125
作者
Capra, Maurizio [1 ]
Bussolino, Beatrice [1 ]
Marchisio, Alberto [2 ]
Masera, Guido [1 ]
Martina, Maurizio [1 ]
Shafique, Muhammad [3 ]
机构
[1] Politecn Torino, Dept Elect Elect & Telecommun Engn, I-10129 Turin, Italy
[2] Tech Univ Wien TU Wien, Inst Comp Engn, A-1040 Vienna, Austria
[3] New York Univ, Div Engn, Abu Dhabi, U Arab Emirates
关键词
Hardware; Neurons; Biological neural networks; Computer architecture; Computational modeling; Field programmable gate arrays; Training; Machine learning; ML; artificial intelligence; AI; deep learning; deep neural networks; DNNs; convolutional neural networks; CNNs; capsule networks; spiking neural networks; VLSI; computer architecture; hardware accelerator; adversarial attacks; data flow; optimization; efficiency; performance; power consumption; energy; area; latency; MODEL COMPRESSION; SPACE EXPLORATION; SPIKING NEURONS; EFFICIENT; MEMORY; ARCHITECTURES; PROCESSOR; DESIGN; METHODOLOGY; COPROCESSOR;
D O I
10.1109/ACCESS.2020.3039858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute- and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them.
引用
收藏
页码:225134 / 225180
页数:47
相关论文
共 332 条
[1]   A Survey of Prediction and Classification Techniques in Multicore Processor Systems [J].
Ababei, Cristinel ;
Moghaddam, Milad Ghorbani .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) :1184-1200
[2]  
Abadi M., 2015, TENSORFLOW LARGE SCA, DOI DOI 10.5431/ARAMIT5201
[3]   APNAS: Accuracy-and-Performance-Aware Neural Architecture Search for Neural Hardware Accelerators [J].
Achararit, Paniti ;
Hanif, Muhammad Abdullah ;
Putra, Rachmad Vidya Wicaksana ;
Shafique, Muhammad ;
Hara-Azumi, Yuko .
IEEE ACCESS, 2020, 8 :165319-165334
[4]   SuperSlash: A Unified Design Space Exploration and Model Compression Methodology for Design of Deep Learning Accelerators With Reduced Off-Chip Memory Access Volume [J].
Ahmad, Hazoor ;
Arif, Tabasher ;
Hanif, Muhammad Abdullah ;
Hafiz, Rehan ;
Shafique, Muhammad .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) :4191-4204
[5]  
Aibin M., 2019, 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), P1, DOI [DOI 10.1109/CCECE.2019.8861778, 10.1109/CCECE.2019.8861778]
[6]   NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].
Aimar, Alessandro ;
Mostafa, Hesham ;
Calabrese, Enrico ;
Rios-Navarro, Antonio ;
Tapiador-Morales, Ricardo ;
Lungu, Iulia-Alexandra ;
Milde, Moritz B. ;
Corradi, Federico ;
Linares-Barranco, Alejandro ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656
[7]  
Al Bahou A, 2018, PROC IEEE COOL CHIPS
[8]  
AL-DABASS D., 1997, PARALLEL ALGORITHMS, V11, P169
[9]  
Alam M., 2019, CORR
[10]   Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing [J].
Albericio, Jorge ;
Judd, Patrick ;
Hetherington, Tayler ;
Aamodt, Tor ;
Jerger, Natalie Enright ;
Moshovos, Andreas .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :1-13