High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引：4

作者：

Hu, Xianghong ^{[1
]}

Huang, Hongmin ^{[1
]}

Li, Xueming ^{[1
]}

Zheng, Xin ^{[1
]}

Ren, Qinyuan ^{[2
]}

He, Jingyu ^{[3
]}

Xiong, Xiaoming ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China

[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China

[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 06期

关键词：

Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;

D O I：

10.1145/3530818

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.

引用

页数：20

共 50 条

[1] Performance comparison of various end-to-end learning technologies with a bandwidth-limited OWC system
Wei, Yuan
Chen, Chaoxu
Yao, Li
Zhang, Haoyu
Li, Ziwei
Shen, Chao
Hang, Unwen
Chi, Nan
Shi, Jianyang
OPTICS EXPRESS, 2024, 32 (19): : 33401 - 33422
[2] Neural Network Detection for Bandwidth-Limited Non-Orthogonal Multiband CAP UVLC System
Chen, Jiang
Wang, Zhe
Zhao, Yiheng
Zhang, Junwen
Li, Ziwei
Shen, Chao
Chi, Nan
IEEE PHOTONICS JOURNAL, 2022, 14 (02):
[3] Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation
Zhu, Yiming
Wei, Yuan
Chen, Chaoxu
Chi, Nan
Shi, Jianyang
SENSORS, 2024, 24 (05)
[4] High performance reconfigurable accelerator for deep convolutional neural networks
Qiao R.
Chen G.
Gong G.
Lu H.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (03): : 130 - 139
[5] A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
Huang, Jiye
Liu, Xin
Guo, Tongdong
Zhao, Zhijin
ELECTRONICS, 2023, 12 (07)
[6] Teleport: A High-Performance ShiftNet Hardware Accelerator with Fused Layer Computation
Kim, Hyunmin
Ryu, Sungju
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[7] AIX: A high performance and energy efficient inference accelerator on FPGA for a DNN-based commercial speech recognition
Ahn, Minwook
Hwang, Seok Joong
Kim, Wonsub
Jung, Seungrok
Lee, Yeonbok
Chung, Mookyoung
Lim, Woohyung
Kim, Youngjoon
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1495 - 1500
[8] High-Performance Embedded System Design for QR Code Recognition With Deep Learning
Gu, Wencheng
Sun, Li
Jiang, Zhipeng
Sun, Kexue
IEEE MULTIMEDIA, 2024, 31 (04) : 70 - 78
[9] MPI as a Programming Model for High-Performance Reconfigurable Computers
Saldana, Manuel
Patel, Arun
Madill, Christopher
Nunes, Daniel
Wang, Danyao
Chow, Paul
Wittig, Ralph
Styles, Henry
Putnam, Andrew
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (04)
[10] A High-performance and High-programmability Reconfigurable Wireless Development Platform
Chen, Jiahua
Wang, Tao
Wu, Haoyang
Gong, Jian
Li, Xiaoguang
Hu, Yang
Zhang, Gaohan
Li, Zhiwei
Yang, Junrui
Lu, Songwu
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 350 - 353

← 1 2 3 4 5 →