Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

被引:2
作者
Khaki, Ahmad Mouri Zadeh [1 ]
Choi, Ahyoung [1 ]
机构
[1] Gachon Univ, Dept AI & Software, Seongnam Si 13120, South Korea
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 01期
关键词
AI hardware acceleration; convolutional neural network (CNN); deep learning; field-programmable gate array (FPGA); transfer learning; TO-DIGITAL CONVERTER; DESIGN; IMPLEMENTATION; EYE; CNN;
D O I
10.3390/app15010422
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29x and 6.6x compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices.
引用
收藏
页数:13
相关论文
共 27 条
  • [1] FPGA Implementation of Complex-Valued Neural Network for Polar-Represented Image Classification
    Ahmad, Maruf
    Zhang, Lei
    Chowdhury, Muhammad E. H.
    [J]. SENSORS, 2024, 24 (03)
  • [2] Akkad Ghattas, 2024, IEEE Transactions on Artificial Intelligence, V5, P1954, DOI 10.1109/TAI.2023.3311776
  • [3] Baitemirova M., 2022, Evaluating Image Classification Models for FPGA Board Status Detection
  • [4] An Approach to the Systematic Characterization of Multitask Accelerated CNN Inference in Edge MPSoCs
    Cilardo, Alessandro
    Maisto, Vincenzo
    Mazzocca, Nicola
    di Torrepadula, Franca Rocco
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (03)
  • [5] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [6] docs.amd, DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide (PG338)
  • [7] docs.amd, Vitis AI User Guide (UG1414)
  • [8] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
  • [9] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
    Guo, Kaiyuan
    Sui, Lingzhi
    Qiu, Jiantao
    Yu, Jincheng
    Wang, Junbin
    Yao, Song
    Han, Song
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
  • [10] Khaki A.M.Z., 2019, J. Electr. Comp. Eng. Innov. (JECEI), V7, P173