Efficient FPGA-Based Transformer Accelerator Using In-Block Balanced Pruning

被引:0
|
作者
Wang, Saiqun [1 ]
Zhang, Hao [1 ]
机构
[1] Ocean Univ China, Informat Sci & Engn, Qingdao, Peoples R China
关键词
Transformer Accelerator; Network Pruning; FPGA; Energy-Efficient Computing; Sparse Storage Pattern;
D O I
10.1109/ICCCAS62034.2024.10651591
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, transformer models have been widely deployed in natural language processing and image processing. However, its superior performance comes with high amount of parameters and computations which make it difficult to deploy transformer models in resource limited devices. To reduce the computation cost of transformer models, in this paper, an improved network pruning method is proposed. In the proposed pruning method, the parameter matrix is decomposed into blocks of a specific size. Then, pruning is applied to each block so that the number of parameters remaining in each block is the same. To further reduce the memory requirement of the parameters, an efficient memory storage pattern for sparse parameters is also proposed in this paper. Finally, by combining the proposed methods, an energy efficient transformer accelerator architecture is proposed. The proposed accelerator is implemented in FPGA devices and implementation results show that the proposed design can significantly improve the speed performance and energy efficiency when compared with previous designs.
引用
收藏
页码:18 / 23
页数:6
相关论文
共 50 条
  • [1] Energy Efficient FPGA-Based Accelerator for Dynamic Sparse Transformer
    Li, Zuohao
    Lai, Yiwan
    Zhang, Hao
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 7 - 12
  • [2] Energy Efficient FPGA-Based Binary Transformer Accelerator for Edge Devices
    Du, Congpeng
    Ko, Seok-Bum
    Zhang, Hao
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [3] An FPGA-Based Transformer Accelerator Using Output Block Stationary Dataflow for Object Recognition Applications
    Zhao, Zhongyu
    Cao, Rujian
    Un, Ka-Fai
    Yu, Wei-Han
    Mak, Pui-In
    Martins, Rui P.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (01) : 281 - 285
  • [4] An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning
    Liu, Zhengyan
    Liu, Qiang
    Yan, Shun
    Cheung, Ray C. C.
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (01)
  • [5] An Efficient FPGA-Based Accelerator Design for Convolution
    Song, Peng-Fei
    Pan, Jeng-Shyang
    Yang, Chun-Sheng
    Lee, Chiou-Yng
    2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 494 - 500
  • [6] An Efficient FPGA-based Accelerator for Deep Forest
    Zhu, Mingyu
    Luo, Jiapeng
    Mao, Wendong
    Wang, Zhongfeng
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3334 - 3338
  • [7] FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer
    Li T.
    Zhang F.
    Wang S.
    Cao W.
    Chen L.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (06): : 2663 - 2672
  • [8] An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
    Shao, Haikuo
    Shil, Huihong
    Mao, Wendong
    Wang, Zhongfeng
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [9] SWAT: An Efficient Swin Transformer Accelerator Based on FPGA
    Dong, Qiwei
    Xie, Xiaoru
    Wang, Zhongfeng
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 515 - 520
  • [10] An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
    Zhang, Weichuang
    Zhao, Jieru
    Shen, Guan
    Chen, Quan
    Chen, Chen
    Guo, Minyi
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 75 - 90