High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引：4

作者：

Hu, Xianghong ^{[1
]}

Huang, Hongmin ^{[1
]}

Li, Xueming ^{[1
]}

Zheng, Xin ^{[1
]}

Ren, Qinyuan ^{[2
]}

He, Jingyu ^{[3
]}

Xiong, Xiaoming ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China

[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China

[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 06期

关键词：

Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;

D O I：

10.1145/3530818

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.

引用

页数：20

共 50 条

[31] A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs
Liu, Hongduo
Qian, Yijian
Liang, Youqiang
Zhang, Bin
Liu, Zhaohan
He, Tao
Zhao, Wenqian
Lu, Jiangbo
Yu, Bei
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
[32] High-Performance SIFT Hardware Accelerator for Real-Time Image Feature Extraction
Huang, Feng-Cheng
Huang, Shi-Yu
Ker, Ji-Wei
Chen, Yung-Chang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (03) : 340 - 351
[33] A High-Performance Pixel-Level Fully Pipelined Hardware Accelerator for Neural Networks
Li, Zhan
Zhang, Zhihan
Hu, Jie
Meng, Qunkang
Shi, Xingyu
Luo, Jun
Wang, Hao
Huang, Qijun
Chang, Sheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[34] SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference
Li, Yuan
Louri, Ahmed
Karanth, Avinash
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2332 - 2345
[35] High-performance Sparsity-aware NPU with Reconfigurable Comparator-multiplier Architecture
Ryu, Sungju
Kim, Jae-Joon
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2024, 24 (06) : 572 - 577
[36] A Reconfigurable High-Performance Multiplier Based on Multi-Granularity Design and Parallel Acceleration
Jing, Feng
Liu, Zijun
Ma, Xiaojun
Yang, Guo
Peng, Guo
Wang, Donglin
PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 567 - 570
[37] A Dynamically Reconfigurable Platform for High-Performance and Low-Power On-Board Processing
Guerrieri, Andrea
Kashani-Akhavan, Sahand
Lombardi, Pasquale
Belhadj, Bilel
Ienne, Paolo
2018 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS 2018), 2018, : 74 - 81
[38] REACT: Scalable and High-Performance Regular Expression Pattern Matching Accelerator for In-Storage Processing
Jeong, Won Seob
Lee, Changmin
Kim, Keunsoo
Yoon, Myung Kuk
Jeon, Won
Jung, Myoungsoo
Ro, Won Woo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (05) : 1137 - 1151
[39] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
Johan Peltenburg
Jeroen van Straten
Matthijs Brobbel
Zaid Al-Ars
H. Peter Hofstee
Journal of Signal Processing Systems, 2021, 93 : 565 - 586
[40] BSTMSM: A High-Performance FPGA-based Multi-Scalar Multiplication Hardware Accelerator
Zhao, Baoze
Huang, Wenjin
Li, Tianrui
Huang, Yihua
2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 35 - 43

← 1 2 3 4 5 →