User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

被引:13
作者
Belabed, Tarek [1 ,2 ,3 ]
Coutinho, Maria Gracielly F. [4 ]
Fernandes, Marcelo A. C. [4 ]
Sakuyama, Carlos Valderrama [1 ]
Souani, Chokri [5 ]
机构
[1] Univ Mons, Fac Polytech, SEMi, B-7000 Mons, Belgium
[2] Univ Sousse, Ecole Natl Ingenieurs Sousse, Sousse 4000, Tunisia
[3] Univ Monastir, Fac Sci, Lab Microelect & Instrumentat, Monastir 5019, Tunisia
[4] Univ Fed Rio Grande do Norte, Dept Comp & Automat Engn, BR-59078970 Natal, RN, Brazil
[5] Univ Sousse, Inst Super Sci Appl & Technol Sousse, Sousse 4003, Tunisia
关键词
Field programmable gate arrays; Topology; Optimization; Hardware; Edge computing; Computer architecture; Tools; Deep learning; electronic design automation; edge computing; FPGA; low power systems; ARTIFICIAL-INTELLIGENCE; STATE;
D O I
10.1109/ACCESS.2021.3090196
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS.
引用
收藏
页码:89162 / 89180
页数:19
相关论文
共 66 条
[1]  
A. Limited, 2020, 0101 A LIM
[2]  
[Anonymous], 2021, DIGIKEY
[3]  
[Anonymous], 2016, Gpu vs fpga performance comparison white paper 2
[4]  
[Anonymous], 2012, P ICML WORKSHOP UNSU
[5]   Explanation in AI and law: Past, present and future [J].
Atkinson, Katie ;
Bench-Capon, Trevor ;
Bollegala, Danushka .
ARTIFICIAL INTELLIGENCE, 2020, 289
[6]  
AWS, 2019, AMAZ EC2 F1 INST
[7]   Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications [J].
Belabed, Tarek ;
Coutinho, Maria Gracielly F. ;
Fernandes, Marcelo A. C. ;
Carlos, Valderrama ;
Souani, Chokri .
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[8]   Face recognition in unconstrained environment with CNN [J].
Ben Fredj, Hana ;
Bouguezzi, Safa ;
Souani, Chokri .
VISUAL COMPUTER, 2021, 37 (02) :217-226
[9]   A Survey and Taxonomy of FPGA-based Deep Learning Accelerators [J].
Blaiech, Ahmed Ghazi ;
Ben Khalifa, Khaled ;
Valderrama, Carlos ;
Fernandes, Marcelo A. C. ;
Bedoui, Mohamed Hedi .
JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 98 :331-345
[10]   Deep Neural Network Hardware Implementation Based on Stacked Sparse Autoencoder [J].
Coutinho, Maria G. F. ;
Torquato, Matheus F. ;
Fernandes, Marcelo A. C. .
IEEE ACCESS, 2019, 7 :40674-40694