A Block-Floating-Point Arithmetic Based FPGA Accelerator for Convolutional Neural Networks

被引:0
|
作者
Zhang, Heshan [1 ]
Liu, Zhenyu [2 ]
Zhang, Guanwen [1 ]
Dai, Jiwu [1 ]
Lian, Xiaocong [3 ]
Zhou, Wei [1 ]
Ji, Xiangyang [3 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian, Peoples R China
[2] Tsinghua Univ, RIIT&TNList, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
来源
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP) | 2019年
基金
中国国家自然科学基金;
关键词
CNN; FPGA; block-floating-point;
D O I
10.1109/globalsip45357.2019.8969292
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural networks (CNNs) have been widely used in computer vision applications and achieved great success. However, large-scale CNN models usually consume a lot of computing and memory resources, which makes it difficult for them to be deployed on embedded devices. An efficient block-floating-point (BFP) arithmetic is proposed in this paper. compared with 32-bit floating-point arithmetic, the memory and off-chip bandwidth requirements during convolution are reduced by 50% and 72.37%, respectively. Due to the adoption of BFP arithmetic, the complex multiplication and addition operations of floating-point numbers can be replaced by the corresponding operations of fixed-point numbers, which is more efficient on hardware. A CNN model can be deployed on our accelerator with no more than 0.14% top-1 accuracy loss, and there is no need for retraining and fine-tuning. By employing a series of ping-pong memory access schemes, 2-dimensional propagate partial multiply-accumulate (PPMAC) processors, and an optimized memory system, we implemented a CNN accelerator on Xilinx VC709 evaluation board. The accelerator achieves a performance of 665.54 GOP/s and a power efficiency of 89.7 GOP/s/W under a 300 MHz working frequency, which outperforms previous FPGA based accelerators significantly.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
    DiCecco, Roberto
    Sun, Lin
    Chow, Paul
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
  • [22] A BLOCK-FLOATING-POINT PROCESSOR FOR RAPID APPLICATION DEVELOPMENT
    Tanaka, Hiroaki
    Takeuchi, Yoshinori
    Sakanushi, Keishi
    Imai, Masaharu
    Kobayashi, Shiro
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 65 - +
  • [23] BFP-CIM: Runtime Energy-Accuracy Scalable Computing-in-Memory-Based DNN Accelerator Using Dynamic Block-Floating-Point Arithmetic
    Chang, Cheng-Yang
    Huang, Chi-Tse
    Chuang, Yu-Chuan
    Chou, Kuang-Chao
    Wu, An-Yeu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (05) : 2079 - 2092
  • [24] Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks
    Dong, Yong
    Hu, Wei
    Wang, Yonghao
    Jiao, Qiang
    Chen, Shuang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 97 - 110
  • [25] Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design
    Song, Zhourui
    Liu, Zhenyu
    Wang, Dongsheng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 816 - 823
  • [26] A block-floating-point system for multiple datapath DSP
    Kobayashi, S
    Fettweis, GP
    1998 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS-SIPS 98: DESIGN AND IMPLEMENTATION, 1998, : 427 - 436
  • [27] Audio application implementations on a block-floating-point DSP
    Kobayashi, S
    Lee, SY
    Kino, T
    Kozuka, L
    Tokui, T
    2002 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, 2002, : 51 - 56
  • [28] FPGA-based Accelerator for Deep Convolutional Neural Networks for the SPARK Environment
    Morcel, Raghid
    Ezzeddine, Mazen
    Akkary, Haitham
    2016 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2016, : 126 - 133
  • [29] Optimization of Energy Efficiency for FPGA-Based Convolutional Neural Networks Accelerator
    Tang, Yongming
    Dai, Rongshi
    Xie, Yi
    2020 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND ARTIFICIAL INTELLIGENCE (CCEAI 2020), 2020, 1487
  • [30] Feasibility of floating-point arithmetic in FPGA based ANNs
    Nichols, KR
    Moussa, MA
    Areibi, SM
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2002, : 8 - 13