NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

被引：20

作者：

Yu, Joonsang ^{[1
]}

Park, Junki ^{[2
]}

Park, Seongmin ^{[3
]}

Kim, Minsoo ^{[3
]}

Lee, Sihwa ^{[3
]}

Lee, Dong Hyun ^{[2
]}

Choi, Jungwook ^{[3
]}

机构：

[1] NAVER Clova, Seongnam, South Korea

[2] Samsung Adv Inst Technol, Mountain View, CA USA

[3] Hanyang Univ, Seoul, South Korea

来源：

PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022 | 2022年

关键词：

Neural network; Transformer; Non-linear function; Look-up table;

D O I：

10.1145/3489517.3530505

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

引用

页码：577 / 582

页数：6

共 13 条

[1] [Anonymous], 2017, NEURIPS
[2] OPTIMAL CURVE FITTING WITH PIECEWISE LINEAR FUNCTIONS
CANTONI, A
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (01) : 59 - &
[3] A High-Performance Deeply Pipelined Architecture for Elementary Transcendental Function Evaluation
Chen, Jing
Liu, Xue
[J]. 2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 209 - 216
[4] Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
[5] Neural Network-Based Accelerators for Transcendental Function Approximation
Eldridge, Schuyler
Raudies, Florian
Zou, David
Joshi, Ajay
[J]. GLSVLSI'14: PROCEEDINGS OF THE 2014 GREAT LAKES SYMPOSIUM ON VLSI, 2014, : 169 - 174
[6] Neural Acceleration for General-Purpose Approximate Programs
Esmaeilzadeh, Hadi
Sampson, Adrian
Ceze, Luis
Burger, Doug
[J]. 2012 IEEE/ACM 45TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-45), 2012, : 449 - 460
[7] Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC
Jang, Jun-Woo
Lee, Sehwan
Kim, Dongyoung
Park, Hyunsun
Ardestani, Ali Shafiee
Choi, Yeongjae
Kim, Channoh
Kim, Yoojin
Yu, Hyeongseok
Abdel-Aziz, Hamzah
Park, Jun-Seok
Lee, Heonsoo
Lee, Dongwoo
Kim, Myeong Woo
Jung, Hanwoong
Nam, Heewoo
Lim, Dongguen
Lee, Seungwon
Song, Joon-Ho
Kwon, Suknam
Hassoun, Joseph
Lim, SukHwan
Choi, Changkyu
[J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 15 - 28
[8] Kim S., 2021, ICML
[9] NVIDIA, NVIDIA Deep Learning Accelerator.
[10] Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
Stevens, Jacob R.
Venkatesan, Rangharajan
Dai, Steve
Khailany, Brucek
Raghunathan, Anand
[J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 469 - 474

← 1 2 →