Optimization strategies for neural network deployment on FPGA: An energy-efficient real-time face detection use case

被引:0
作者
Al Koutayni, Mhd Rashed [1 ]
Reis, Gerd [1 ]
Stricker, Didier [1 ]
机构
[1] DFKI, German Res Ctr Artificial Intelligence, D-67663 Kaiserslautern, Germany
关键词
IoT; Edge computing; Artificial intelligence; Deep learning; Optimization; Hardware acceleration; Quantization; High-level synthesis; FPGA; System-on-a-chip (SoC); Face detection; ARCHITECTURE;
D O I
10.1016/j.iot.2025.101676
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Field programmable gate arrays (FPGAs) are considered promising platforms for accelerating deep neural networks (DNNs) due to their parallel processing capabilities and energy efficiency. However, Deploying DNNs on FPGA platforms for computer vision tasks presents unique challenges, such as limited computational resources, constrained power budgets, and the need for real-time performance. This work presents a set of optimization methodologies to enhance the efficiency of real-time DNN inference on FPGA system-on-a-chip (SoC) platforms. These optimizations include architectural modifications, fixed-point quantization, computation reordering, and parallelization. Additionally, hardware/software partitioning is employed to optimize task allocation between the processing system (PS) and programmable logic (PL), along with system integration and interface configuration. To validate these strategies, we apply them to a baseline face detection DNN (FaceBoxes) as a use case. The proposed techniques not only improve the efficiency of FaceBoxes on FPGA but also provide a roadmap for optimizing other DNN-based applications for resource-constrained platforms. Experimental results on the AMD Xilinx ZCU102 board with VGA resolution (480 x 640 x 3) input demonstrate a significant increase in efficiency, achieving real-time performance while substantially reducing dynamic energy consumption.
引用
收藏
页数:16
相关论文
共 19 条
[1]   DeepEdgeSoC: End-to-end deep learning framework for edge IoT devices [J].
Al Koutayni, Mhd Rashed ;
Reis, Gerd ;
Stricker, Didier .
INTERNET OF THINGS, 2023, 21
[2]  
Ben Fekih Hichem, 2015, Applied Reconfigurable Computing. 11th International Symposium, ARC 2015. Proceedings: LNCS 9040, P243, DOI 10.1007/978-3-319-16214-0_20
[3]   Face Detection using Local Patterns in FPGA [J].
Byun, Jin Young ;
Jeon, Jae Wook .
PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
[4]  
Carlueho J, 2018, IEEE INT C INT ROBOT, P2336, DOI 10.1109/IROS.2018.8594067
[5]   Real-Time and Low-Memory Multi-Faces Detection System Design With Naive Bayes Classifier Implemented on FPGA [J].
Chou, Kuan-Yu ;
Chen, Yon-Ping .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (11) :4380-4389
[6]  
Cong Fu, 2019, 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), P467, DOI 10.1109/ROBIO49542.2019.8961745
[7]  
Fang T., 2021, 2021 IEEE INT C EL T, P1
[8]  
Gunay B, 2022, Arxiv, DOI arXiv:2207.10482
[9]   A Novel SoC Architecture on FPGA for Ultra Fast Face Detection [J].
He, Chun ;
Papakonstantinou, Alexandros ;
Chen, Deming .
2009 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2009, :412-+
[10]  
Köstinger M, 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS)