An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device

被引:0
作者
Zhou, Xin [1 ]
Xie, Wei [1 ]
Zhou, Han [1 ]
Cheng, Yongjing [1 ]
Wang, Ximing [1 ]
Ren, Yun [1 ]
Yuan, Shandong [1 ]
Li, Liuwen [1 ]
机构
[1] Natl Univ Def Technol, Coll Informat & Commun, Wuhan 430000, Hubei, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Long short term memory; Feature extraction; Convolutional neural networks; Convolution; Computer architecture; Field programmable gate arrays; Pipelines; CNN-LSTM; field programmable gate array (FPGA); hardware acceleration; deep learning;
D O I
10.1109/ACCESS.2024.3437663
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the combination of convolutional neural network (CNN) and long short-term memory (LSTM) exhibits better performance than single network architecture. Most of these studies connect LSTM networks behind CNNs. When operating on hardware, the current design of CNN-LSTM is similar to a pipeline architecture. However, the classic structure lead to a feature loss when data is sent to LSTM since CNN is not good at extracting temporal features. At the same time, as the depth and scale increases, it will bring a huge amount of computation, which makes hardware implementation difficult. Based on that, a parallel CNN-LSTM architecture is proposed, in which two networks extract features from the input data synchronously, being proven to be more effective than classical CNN-LSTM. This paper designs a parallel CNN-LSTM computing device based on FPGA. The device is divided into control unit and operation unit. Control stream and data stream transport between the two units, ensuring the proper running of the device. A highly parallel multi-channel convolution layer and pooling layer are designed to improve the calculation efficiency. A 4-stage pipeline structure is adopted to implement the LSTM part. This paper makes full use of on-chip BRAM to design a look-up table for activation function approximation, reducing the resource consumption by 95% compared with the traditional polynomial approximation. Finally, we verify our device under cooperative spectrum sensing (CSS) and handwritten classification scenarios. Proposed device reaches higher accuracy in two scenarios compared with classic CNN-LSTM structure as well as faster calculating speed, and the overall project power is limited below 2W. The scalability and limitation of this computing device are also discussed.
引用
收藏
页码:106579 / 106592
页数:14
相关论文
共 32 条
  • [1] FPGA-based parallel implementation to classify Hyperspectral images by using a Convolutional Neural Network
    Baba, Abdullatif
    Bonny, Talal
    [J]. INTEGRATION-THE VLSI JOURNAL, 2023, 92 : 15 - 23
  • [2] A CNN Accelerator on FPGA Using Depthwise Separable Convolution
    Bai, Lin
    Zhao, Yiming
    Huang, Xinming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) : 1415 - 1419
  • [3] Bao R., 2022, P 2 INT C EL ENG CON, P986
  • [4] Dong X., 2019, P IEEE INT WORKSH FU, P1
  • [5] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
    Guo, Kaiyuan
    Sui, Lingzhi
    Qiu, Jiantao
    Yu, Jincheng
    Wang, Junbin
    Yao, Song
    Han, Song
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
  • [6] An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks
    He, Dazhong
    He, Junhua
    Liu, Jun
    Yang, Jie
    Yan, Qing
    Yang, Yang
    [J]. ELECTRONICS, 2021, 10 (06) : 1 - 15
  • [7] Efficient CNN Accelerator on FPGA
    Kala, S.
    Nalesh, S.
    [J]. IETE JOURNAL OF RESEARCH, 2020, 66 (06) : 733 - 740
  • [8] Towards a component-based acceleration of convolutional neural networks on FPGAs
    Kwadjo, Danielle Tchuinkou
    Tchinda, Erman Nghonda
    Mbongue, Joel Mandebi
    Bobda, Christophe
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 167 : 123 - 135
  • [9] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [10] Adaptive undersampling and short clip-based two-stream CNN-LSTM model for surgical phase recognition on cholecystectomy videos
    Lee, Sang-Goo
    Kim, Ga-Young
    Hwang, Yoo-Na
    Kwon, Ji-Yean
    Kim, Sung-Min
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88