High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

被引:4
|
作者
Hu, Xianghong [1 ]
Huang, Hongmin [1 ]
Li, Xueming [1 ]
Zheng, Xin [1 ]
Ren, Qinyuan [2 ]
He, Jingyu [3 ]
Xiong, Xiaoming [1 ]
机构
[1] Guangdong Univ Technol, Sch Microelectron, Guangzhou 510006, Guangdong, Peoples R China
[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong 999077, Peoples R China
关键词
Convolutional neural networks; reconfigurable; accelerator; real-time object detection system; design space exploration; NEURAL-NETWORK; HARDWARE ACCELERATOR;
D O I
10.1145/3530818
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external memory bandwidth for other tasks, such as video capture and display, while leaving little bandwidth for accelerating DNNs. In order to solve this issue, in this study, we propose a high-throughput accelerator, called reconfigurable tiny neural network accelerator (ReTiNNA), for the bandwidth-limited system and present a real-time object detection system for the high-resolution video image. We first present a dedicated computation engine that takes different datamapping methods for various filter types to improve data reuse and reduce hardware resources. We then propose an adaptive layer-wise tiling strategy that tiles the feature maps into strips to reduce the control complexity of data transmission dramatically and to improve the efficiency of data transmission. Finally, a design space exploration (DSE) approach is presented to explore design space more accurately in the case of insufficient bandwidth to improve the performance of the low-bandwidth accelerator. With a low bandwidth of 2.23 GB/s and a low hardware consumption of 90.261K LUTs and 448 DSPs, ReTiNNA can still achieve a high performance of 155.86 GOPS on VGG16 and 68.20 GOPS on ResNet50, which is better than other state-of-the-art designs implemented on FPGA devices. Furthermore, the real-time object detection system can achieve a high object detection speed of 19 fps for high-resolution video.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] A High-Performance Processing-in-Memory Accelerator for Inline Data Deduplication
    Lee, Young Seo
    Kim, Kyung Min
    Lee, Ji Heon
    Choi, Jeong Hwan
    Chung, Sung Woo
    2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 515 - 523
  • [22] A High-Performance Multimem SHA-256 Accelerator for Society 5.0
    Tran, Thi Hong
    Pham, Hoai Luan
    Nakashima, Yasuhiko
    IEEE ACCESS, 2021, 9 : 39182 - 39192
  • [23] High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network
    Vardhana, M.
    Pinto, Rohan
    IEEE COMPUTER ARCHITECTURE LETTERS, 2025, 24 (01) : 21 - 24
  • [24] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
  • [25] A Coarse-Grained Reconfigurable Array for High-Performance Computing Applications
    Kasgen, Philipp S.
    Weinhardt, Markus
    Hochberger, Christian
    2018 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2018,
  • [26] Design of High-Performance Millimeter-Wave Radar Digital Processing Accelerator
    Ai, Lingbo
    Li, Zhentao
    Yu, Yang
    Zeng, Menglin
    2024 INTERNATIONAL CONFERENCE ON MICROWAVE AND MILLIMETER WAVE TECHNOLOGY, ICMMT, 2024,
  • [27] HPKA: A High-Performance CRYSTALS-Kyber Accelerator Exploring Efficient Pipelining
    Ni, Ziying
    Khalid, Ayesha
    Kundi, Dur-e-Shahwar
    Oneill, Maire
    Liu, Weiqiang
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (12) : 3340 - 3353
  • [28] TDGraph: A Topology-Driven Accelerator for High-Performance Streaming Graph Processing
    Zhao, Jin
    Yang, Yun
    Zhang, Yu
    Liao, Xiaofei
    Gu, Lin
    He, Ligang
    He, Bingsheng
    Jin, Hai
    Liu, Haikun
    Jiang, Xinyu
    Yu, Hui
    PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, : 116 - 129
  • [29] Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
    De Matteis, Tiziano
    Licht, Johannes de Fine
    Beranek, Jakub
    Hoefler, Torsten
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [30] Enabling High-Performance DNN Inference Accelerators using Non-Volatile Analog Memory (Invited)
    Chen, An
    Ambrogio, Stefano
    Narayanan, Pritish
    Tsai, Hsinyu
    Mackin, Charles
    2020 IEEE ELECTRON DEVICES TECHNOLOGY AND MANUFACTURING CONFERENCE (EDTM 2020), 2020,