DIF-LUT: A Simple Yet Scalable Approximation for Non-linear Activation Function on FPGA

被引：0

作者：

Liu, Yang ^{[1
]}

He, Xiaoming ^{[1
]}

Yu, Jun ^{[1
]}

Wang, Kun ^{[1
]}

机构：

[1] Fudan Univ, State Key Lab ASIC & Syst, Shanghai, Peoples R China

来源：

2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL | 2023年

关键词：

Non-linear Approximation; Activation Function; Neural Network; Look-up Table; FPGA; OPU;

D O I：

10.1109/FPL60245.2023.00055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Non-linear activation function plays an essential role in neural networks (NNs) for their generalization ability. However, deploying the intricate mathematical operations on hardware platforms like Field-Programmable Gate Array (FPGA) turns out a great challenge. Prior works based on piecewise functions or look-up table (LUT) either involve complex manual operations or neglect hardware overhead. To this end, this paper proposes a simple yet scalable and effective approximation called DIF-LUT, which is applicable to various non-linear functions. Specifically, the proposed method can achieve accurate approximation by piecewise linear matching to fit the function derivative roughly and range addressable LUT to offset the difference. Moreover, self-adaptive mechanisms are applied to automatically minimize hardware cost in terms of different accuracies. The experiments show that compared to state-of-the-art methods, DIF-LUT costs 43.68% fewer LUTs and 70.8% fewer flip-flops (FFs) without any digital signal processor (DSP), while achieving 2.7x approximation accuracy at 554.1MHz on Xilinx Zynq UltraScale+.

引用

页码：322 / 326

页数：5

共 25 条

[1] [Anonymous], 2022, FMQL45T900 FPGA Development Board
[2] Controlled accuracy approximation of sigmoid function for efficient FPGA-based implementation of artificial neurons
del Campo, I.
Finker, R.
Echanobe, J.
Basterretxea, K.
[J]. ELECTRONICS LETTERS, 2013, 49 (25) : 1598 - 1600
[3] Feng Liu, 2021, 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), P222, DOI 10.1109/ICTA53157.2021.9661606
[4] Gomar S, 2016, CONF REC ASILOMAR C, P1586, DOI 10.1109/ACSSC.2016.7869646
[5] Low Cost Hardware Implementation of Logarithm Approximation
Gutierrez, R.
Valls, J.
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2011, 19 (12) : 2326 - 2330
[6] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[7] XVDPU: A High Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
Jia, Xijie
Zhang, Yu
Liu, Guangdong
Yang, Xinlin
Zhang, Tianyu
Zheng, Jia
Xu, Dongdong
Wang, Hong
Zheng, Rongzhang
Pareek, Satyaprakash
Tian, Lu
Xie, Dongliang
Luo, Hong
Shan, Yi
[J]. 2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 209 - 217
[8] Kaloev M., 2021, P 3 INT C HUM COMP I, P1
[9] Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing
Khataei, Alireza
Singh, Gaurav
Bazargan, Kia
[J]. PROCEEDINGS OF THE 2023 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA 2023, 2023, : 165 - 175
[10] High Speed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function
Leboeuf, Karl
Namin, Ashkan Hosseinzadeh
Muscedere, Roberto
Wu, Huapeng
Ahmadi, Majid
[J]. THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 1070 - 1073

← 1 2 3 →