A Parametrizable High-Level Synthesis Library for Accelerating Neural Networks on FPGAs

被引:0
作者
Lester Kalms
Pedram Amini Rad
Muhammad Ali
Arsany Iskander
Diana Göhringer
机构
[1] Technische Universität Dresden,
[2] German University in Cairo,undefined
来源
Journal of Signal Processing Systems | 2021年 / 93卷
关键词
High-level synthesis; Neural networks; FPGA; Hardware acceleration; Library;
D O I
暂无
中图分类号
学科分类号
摘要
In recent years, Convolutional Neural Network CNN have been incorporated in a large number of applications, including multimedia retrieval and image classification. However, CNN based algorithms are computationally and resource intensive and therefore difficult to be used in embedded systems. FPGA based accelerators are becoming more and more popular in research and industry due to their flexibility and energy efficiency. However, the available resources and the size of the on-chip memory can limit the performance of the FPGA accelerator for CNN. This work proposes an High-Level Synthesis HLS library for CNN algorithms. It contains seven different streaming-capable CNN (plus two conversion) functions for creating large neural networks with deep pipelines. The different functions have many parameter settings (e.g. for resolution, feature maps, data types, kernel size, parallelilization, accuracy, etc.), which also enable compile-time optimizations. Our functions are integrated into the HiFlipVX library, which is an open source HLS FPGA library for image processing and object detection. This offers the possibility to implement different types of computer vision applications with one library. Due to the various configuration and parallelization possibilities of the library functions, it is possible to implement a high-performance, scalable and resource-efficient system, as our evaluation of the MobileNets algorithm shows.
引用
收藏
页码:513 / 529
页数:16
相关论文
共 32 条
  • [1] Guo K(2018)Angel-eye: A complete design flow for mapping cnn onto embedded fpga IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 35-47
  • [2] Sui L(2013)3d convolutional neural networks for human action recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 35 221-231
  • [3] Qiu J(2017)Imagenet classification with deep convolutional neural networks Communications of the ACM 60 84-90
  • [4] Yu J(2019)An fpga-based cnn accelerator integrating depthwise separable convolution Electronics 8 281-1149
  • [5] Wang J(2019)A uniform architecture design for accelerating 2d and 3d cnns on fpgas Electronics 8 65-undefined
  • [6] Yao S(2017)Faster r-cnn: Towards real-time object detection with region proposal networks IEEE Transactions on Pattern Analysis and Machine Intelligence 39 1137-undefined
  • [7] Han S(undefined)undefined undefined undefined undefined-undefined
  • [8] Wang Y(undefined)undefined undefined undefined undefined-undefined
  • [9] Yang H(undefined)undefined undefined undefined undefined-undefined
  • [10] Ji S(undefined)undefined undefined undefined undefined-undefined