DIF-LUT: A Simple Yet Scalable Approximation for Non-linear Activation Function on FPGA

被引:0
作者
Liu, Yang [1 ]
He, Xiaoming [1 ]
Yu, Jun [1 ]
Wang, Kun [1 ]
机构
[1] Fudan Univ, State Key Lab ASIC & Syst, Shanghai, Peoples R China
来源
2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL | 2023年
关键词
Non-linear Approximation; Activation Function; Neural Network; Look-up Table; FPGA; OPU;
D O I
10.1109/FPL60245.2023.00055
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-linear activation function plays an essential role in neural networks (NNs) for their generalization ability. However, deploying the intricate mathematical operations on hardware platforms like Field-Programmable Gate Array (FPGA) turns out a great challenge. Prior works based on piecewise functions or look-up table (LUT) either involve complex manual operations or neglect hardware overhead. To this end, this paper proposes a simple yet scalable and effective approximation called DIF-LUT, which is applicable to various non-linear functions. Specifically, the proposed method can achieve accurate approximation by piecewise linear matching to fit the function derivative roughly and range addressable LUT to offset the difference. Moreover, self-adaptive mechanisms are applied to automatically minimize hardware cost in terms of different accuracies. The experiments show that compared to state-of-the-art methods, DIF-LUT costs 43.68% fewer LUTs and 70.8% fewer flip-flops (FFs) without any digital signal processor (DSP), while achieving 2.7x approximation accuracy at 554.1MHz on Xilinx Zynq UltraScale+.
引用
收藏
页码:322 / 326
页数:5
相关论文
共 25 条
  • [1] [Anonymous], 2022, FMQL45T900 FPGA Development Board
  • [2] Controlled accuracy approximation of sigmoid function for efficient FPGA-based implementation of artificial neurons
    del Campo, I.
    Finker, R.
    Echanobe, J.
    Basterretxea, K.
    [J]. ELECTRONICS LETTERS, 2013, 49 (25) : 1598 - 1600
  • [3] Feng Liu, 2021, 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), P222, DOI 10.1109/ICTA53157.2021.9661606
  • [4] Gomar S, 2016, CONF REC ASILOMAR C, P1586, DOI 10.1109/ACSSC.2016.7869646
  • [5] Low Cost Hardware Implementation of Logarithm Approximation
    Gutierrez, R.
    Valls, J.
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2011, 19 (12) : 2326 - 2330
  • [6] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
  • [7] XVDPU: A High Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
    Jia, Xijie
    Zhang, Yu
    Liu, Guangdong
    Yang, Xinlin
    Zhang, Tianyu
    Zheng, Jia
    Xu, Dongdong
    Wang, Hong
    Zheng, Rongzhang
    Pareek, Satyaprakash
    Tian, Lu
    Xie, Dongliang
    Luo, Hong
    Shan, Yi
    [J]. 2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 209 - 217
  • [8] Kaloev M., 2021, P 3 INT C HUM COMP I, P1
  • [9] Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing
    Khataei, Alireza
    Singh, Gaurav
    Bazargan, Kia
    [J]. PROCEEDINGS OF THE 2023 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA 2023, 2023, : 165 - 175
  • [10] High Speed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function
    Leboeuf, Karl
    Namin, Ashkan Hosseinzadeh
    Muscedere, Roberto
    Wu, Huapeng
    Ahmadi, Majid
    [J]. THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 1070 - 1073