High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引：4

作者：

Hu, Xianghong ^{[1
]}

Huang, Hongmin ^{[1
]}

Li, Xueming ^{[1
]}

Zheng, Xin ^{[1
]}

Ren, Qinyuan ^{[2
]}

He, Jingyu ^{[3
]}

Xiong, Xiaoming ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China

[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China

[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 06期

关键词：

Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;

D O I：

10.1145/3530818

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.

引用

页数：20

共 50 条

[41] Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow
Peltenburg, Johan
van Straten, Jeroen
Brobbel, Matthijs
Al-Ars, Zaid
Hofstee, H. Peter
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (05): : 565 - 586
[42] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
Li, Kai
Mao, Wei
Zhou, Junzhuo
Li, Boyu
Yang, Zhengke
Yang, Shuxing
Du, Laimin
Huang, Sixiao
Yu, Hao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4123 - 4127
[43] Synthesizing High-performance Reconfigurable Meta- devices through Multi-objective Optimization
Campbell, Sawyer D.
Wu, Yuhao
Whiting, Eric B.
Kang, Lei
Werner, Pingjuan L.
Werner, Douglas H.
APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2020, 35 (11): : 1441 - 1442
[44] Synthesizing High-performance Reconfigurable Meta-devices through Multi-objective Optimization
Campbell, Sawyer D.
Wu, Yuhao
Whiting, Eric B.
Kang, Lei
Werner, Pingjuan L.
Werner, Douglas H.
2020 INTERNATIONAL APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY SYMPOSIUM (2020 ACES-MONTEREY), 2020,
[45] SHP-FsNTT: A Scalable and High-Performance NTT Accelerator Based on the Four-step Algorithm
Chen, Xiaojie
Lu, Weicong
Su, Tao
Chen, Dihu
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[46] A High-Performance and Ultra-Low-Power Accelerator Design for Advanced Deep Learning Algorithms on an FPGA
Gundrapally, Achyuth
Shah, Yatrik Ashish
Alnatsheh, Nader
Choi, Kyuwon Ken
ELECTRONICS, 2024, 13 (13)
[47] BiSon-e: A Lightweight and High-Performance Accelerator for Narrow Integer Linear Algebra Computing on the Edge
Reggiani, Enrico
Ramirez Lazo, Cristobal
Figueras Bague, Roger
Cristal, Adrian
Olivieri, Mauro
Sabri Unsal, Osman
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 56 - 69
[48] FNNG: A High-Performance FPGA-based Accelerator for K-Nearest Neighbor Graph Construction
Liu, Chaoqiang
Liu, Haifeng
Zheng, Long
Huang, Yu
Ye, Xiangyu
Liao, Xiaofei
Jin, Hai
PROCEEDINGS OF THE 2023 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA 2023, 2023, : 67 - 77
[49] A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm
Zeng, Gang
Zhu, Jianfeng
Zhang, Yichi
Chen, Ganhui
Yuan, Zhenhai
Wei, Shaojun
Liu, Leibo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 237 - 249
[50] MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks
Cheng, Yuming
Wang, Chao
Zhao, Yangyang
Chen, Xianglan
Zhou, Xuehai
Li, Xi
PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 435 - 438

← 1 2 3 4 5 →