Accelerating Deep Neural Networks implementation: A survey

被引：21

作者：

Dhouibi, Meriam ^{[1
]}

Ben Salem, Ahmed Karim ^{[1
]}

Saidi, Afef ^{[1
]}

Ben Saoud, Slim ^{[1
]}

机构：

[1] Univ Carthage, Tunisia Polytech Sch, Adv Syst Lab, BP 743, La Marsa 2078, Tunisia

来源：

IET COMPUTERS AND DIGITAL TECHNIQUES | 2021年 / 15卷 / 02期

关键词：

LEARNING ALGORITHMS;

D O I：

10.1049/cdt2.12016

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, Deep Learning (DL) applications are getting more and more involved in different fields. Deploying such Deep Neural Networks (DNN) on embedded devices is still a challenging task considering the massive requirement of computation and storage. Given that the number of operations and parameters increases with the complexity of the model architecture, the performance will strongly depend on the hardware target resources and basically the memory footprint of the accelerator. Recent research studies have discussed the benefit of implementing some complex DL applications based on different models and platforms. However, it is necessary to guarantee the best performance when designing hardware accelerators for DL applications to run at full speed, despite the constraints of low power, high accuracy and throughput. Field Programmable Gate Arrays (FPGAs) are promising platforms for the deployment of large-scale DNN which seek to reach a balance between the above objectives. Besides, the growing complexity of DL models has made researches think about applying optimization techniques to make them more hardware-friendly. Herein, DL concept is presented. Then, a detailed description of different optimization techniques used in recent research works is explored. Finally, a survey of research works aiming to accelerate the implementation of DNN models on FPGAs is provided.

引用

页码：79 / 96

页数：18

共 127 条

[1]

Ackerman E., 2017, IEEE Spectrum, V1

[2]

Alrawashdeh K, 2017, PROC NAECON IEEE NAT, P57, DOI 10.1109/NAECON.2017.8268745

[3]

Alwani M, 2016, INT SYMP MICROARCH

[4]

Amazon, 2019, AM EC2 F1 INST

[5]

[Anonymous], 2019, FPGA CLOUD SERV FPGA

[6]

[Anonymous], 2019, FPGA CLOUD SERVER BA

[7]

[Anonymous], 2016, P 24 ACM INT C MULT, DOI 10.1145/2964284.2967280

[8]

[Anonymous], 2019, FPGA ACC CLOUD SERV

[9]

[Anonymous], 2019, LIGHTSPEEUR 2801 NEU

[10]

[Anonymous], 2019, FPGA CLOUD SERV

← 1 2 3 4 5 6 7 8 9 10 →