High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引:4
|
作者
Hu, Xianghong [1 ]
Huang, Hongmin [1 ]
Li, Xueming [1 ]
Zheng, Xin [1 ]
Ren, Qinyuan [2 ]
He, Jingyu [3 ]
Xiong, Xiaoming [1 ]
机构
[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China
[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China
关键词
Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;
D O I
10.1145/3530818
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
    Peltenburg, Johan
    van Straten, Jeroen
    Brobbel, Matthijs
    Al-Ars, Zaid
    Hofstee, H. Peter
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (05): : 565 - 586
  • [42] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
    Li, Kai
    Mao, Wei
    Zhou, Junzhuo
    Li, Boyu
    Yang, Zhengke
    Yang, Shuxing
    Du, Laimin
    Huang, Sixiao
    Yu, Hao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4123 - 4127
  • [43] Synthesizing High-performance Reconfigurable Meta- devices through Multi-objective Optimization
    Campbell, Sawyer D.
    Wu, Yuhao
    Whiting, Eric B.
    Kang, Lei
    Werner, Pingjuan L.
    Werner, Douglas H.
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2020, 35 (11): : 1441 - 1442
  • [44] Synthesizing High-performance Reconfigurable Meta-devices through Multi-objective Optimization
    Campbell, Sawyer D.
    Wu, Yuhao
    Whiting, Eric B.
    Kang, Lei
    Werner, Pingjuan L.
    Werner, Douglas H.
    2020 INTERNATIONAL APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY SYMPOSIUM (2020 ACES-MONTEREY), 2020,
  • [45] SHP-FsNTT: A Scalable and High-Performance NTT Accelerator Based on the Four-step Algorithm
    Chen, Xiaojie
    Lu, Weicong
    Su, Tao
    Chen, Dihu
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [46] A High-Performance and Ultra-Low-Power Accelerator Design for Advanced Deep Learning Algorithms on an FPGA
    Gundrapally, Achyuth
    Shah, Yatrik Ashish
    Alnatsheh, Nader
    Choi, Kyuwon Ken
    ELECTRONICS, 2024, 13 (13)
  • [47] BiSon-e: A Lightweight and High-Performance Accelerator for Narrow Integer Linear Algebra Computing on the Edge
    Reggiani, Enrico
    Ramirez Lazo, Cristobal
    Figueras Bague, Roger
    Cristal, Adrian
    Olivieri, Mauro
    Sabri Unsal, Osman
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 56 - 69
  • [48] FNNG: A High-Performance FPGA-based Accelerator for K-Nearest Neighbor Graph Construction
    Liu, Chaoqiang
    Liu, Haifeng
    Zheng, Long
    Huang, Yu
    Ye, Xiangyu
    Liao, Xiaofei
    Jin, Hai
    PROCEEDINGS OF THE 2023 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA 2023, 2023, : 67 - 77
  • [49] A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm
    Zeng, Gang
    Zhu, Jianfeng
    Zhang, Yichi
    Chen, Ganhui
    Yuan, Zhenhai
    Wei, Shaojun
    Liu, Leibo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 237 - 249
  • [50] MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks
    Cheng, Yuming
    Wang, Chao
    Zhao, Yangyang
    Chen, Xianglan
    Zhou, Xuehai
    Li, Xi
    PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 435 - 438