Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

被引：2

作者：

Khaki, Ahmad Mouri Zadeh ^{[1
]}

Choi, Ahyoung ^{[1
]}

机构：

[1] Gachon Univ, Dept AI & Software, Seongnam Si 13120, South Korea

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 01期

关键词：

AI hardware acceleration; convolutional neural network (CNN); deep learning; field-programmable gate array (FPGA); transfer learning; TO-DIGITAL CONVERTER; DESIGN; IMPLEMENTATION; EYE; CNN;

D O I：

10.3390/app15010422

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29x and 6.6x compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.

引用

页数：13

共 27 条

[1] FPGA Implementation of Complex-Valued Neural Network for Polar-Represented Image Classification
Ahmad, Maruf
Zhang, Lei
Chowdhury, Muhammad E. H.
[J]. SENSORS, 2024, 24 (03)
[2] Akkad Ghattas, 2024, IEEE Transactions on Artificial Intelligence, V5, P1954, DOI 10.1109/TAI.2023.3311776
[3] Baitemirova M., 2022, Evaluating Image Classification Models for FPGA Board Status Detection
[4] An Approach to the Systematic Characterization of Multitask Accelerated CNN Inference in Edge MPSoCs
Cilardo, Alessandro
Maisto, Vincenzo
Mazzocca, Nicola
di Torrepadula, Franca Rocco
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (03)
[5] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[6] docs.amd, DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide (PG338)
[7] docs.amd, Vitis AI User Guide (UG1414)
[8] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
Guo, Kaiyuan
Sui, Lingzhi
Qiu, Jiantao
Yu, Jincheng
Wang, Junbin
Yao, Song
Han, Song
Wang, Yu
Yang, Huazhong
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
[10] Khaki A.M.Z., 2019, J. Electr. Comp. Eng. Innov. (JECEI), V7, P173

← 1 2 3 →