High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引:4
|
作者
Hu, Xianghong [1 ]
Huang, Hongmin [1 ]
Li, Xueming [1 ]
Zheng, Xin [1 ]
Ren, Qinyuan [2 ]
He, Jingyu [3 ]
Xiong, Xiaoming [1 ]
机构
[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China
[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China
关键词
Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;
D O I
10.1145/3530818
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs
    Liu, Hongduo
    Qian, Yijian
    Liang, Youqiang
    Zhang, Bin
    Liu, Zhaohan
    He, Tao
    Zhao, Wenqian
    Lu, Jiangbo
    Yu, Bei
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
  • [32] High-Performance SIFT Hardware Accelerator for Real-Time Image Feature Extraction
    Huang, Feng-Cheng
    Huang, Shi-Yu
    Ker, Ji-Wei
    Chen, Yung-Chang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (03) : 340 - 351
  • [33] A High-Performance Pixel-Level Fully Pipelined Hardware Accelerator for Neural Networks
    Li, Zhan
    Zhang, Zhihan
    Hu, Jie
    Meng, Qunkang
    Shi, Xingyu
    Luo, Jun
    Wang, Hao
    Huang, Qijun
    Chang, Sheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [34] SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2332 - 2345
  • [35] High-performance Sparsity-aware NPU with Reconfigurable Comparator-multiplier Architecture
    Ryu, Sungju
    Kim, Jae-Joon
    JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2024, 24 (06) : 572 - 577
  • [36] A Reconfigurable High-Performance Multiplier Based on Multi-Granularity Design and Parallel Acceleration
    Jing, Feng
    Liu, Zijun
    Ma, Xiaojun
    Yang, Guo
    Peng, Guo
    Wang, Donglin
    PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 567 - 570
  • [37] A Dynamically Reconfigurable Platform for High-Performance and Low-Power On-Board Processing
    Guerrieri, Andrea
    Kashani-Akhavan, Sahand
    Lombardi, Pasquale
    Belhadj, Bilel
    Ienne, Paolo
    2018 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS 2018), 2018, : 74 - 81
  • [38] REACT: Scalable and High-Performance Regular Expression Pattern Matching Accelerator for In-Storage Processing
    Jeong, Won Seob
    Lee, Changmin
    Kim, Keunsoo
    Yoon, Myung Kuk
    Jeon, Won
    Jung, Myoungsoo
    Ro, Won Woo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (05) : 1137 - 1151
  • [39] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
    Johan Peltenburg
    Jeroen van Straten
    Matthijs Brobbel
    Zaid Al-Ars
    H. Peter Hofstee
    Journal of Signal Processing Systems, 2021, 93 : 565 - 586
  • [40] BSTMSM: A High-Performance FPGA-based Multi-Scalar Multiplication Hardware Accelerator
    Zhao, Baoze
    Huang, Wenjin
    Li, Tianrui
    Huang, Yihua
    2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 35 - 43