NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

被引:20
作者
Yu, Joonsang [1 ]
Park, Junki [2 ]
Park, Seongmin [3 ]
Kim, Minsoo [3 ]
Lee, Sihwa [3 ]
Lee, Dong Hyun [2 ]
Choi, Jungwook [3 ]
机构
[1] NAVER Clova, Seongnam, South Korea
[2] Samsung Adv Inst Technol, Mountain View, CA USA
[3] Hanyang Univ, Seoul, South Korea
来源
PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022 | 2022年
关键词
Neural network; Transformer; Non-linear function; Look-up table;
D O I
10.1145/3489517.3530505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.
引用
收藏
页码:577 / 582
页数:6
相关论文
共 13 条
  • [1] [Anonymous], 2017, NEURIPS
  • [2] OPTIMAL CURVE FITTING WITH PIECEWISE LINEAR FUNCTIONS
    CANTONI, A
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (01) : 59 - &
  • [3] A High-Performance Deeply Pipelined Architecture for Elementary Transcendental Function Evaluation
    Chen, Jing
    Liu, Xue
    [J]. 2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 209 - 216
  • [4] Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
  • [5] Neural Network-Based Accelerators for Transcendental Function Approximation
    Eldridge, Schuyler
    Raudies, Florian
    Zou, David
    Joshi, Ajay
    [J]. GLSVLSI'14: PROCEEDINGS OF THE 2014 GREAT LAKES SYMPOSIUM ON VLSI, 2014, : 169 - 174
  • [6] Neural Acceleration for General-Purpose Approximate Programs
    Esmaeilzadeh, Hadi
    Sampson, Adrian
    Ceze, Luis
    Burger, Doug
    [J]. 2012 IEEE/ACM 45TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-45), 2012, : 449 - 460
  • [7] Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC
    Jang, Jun-Woo
    Lee, Sehwan
    Kim, Dongyoung
    Park, Hyunsun
    Ardestani, Ali Shafiee
    Choi, Yeongjae
    Kim, Channoh
    Kim, Yoojin
    Yu, Hyeongseok
    Abdel-Aziz, Hamzah
    Park, Jun-Seok
    Lee, Heonsoo
    Lee, Dongwoo
    Kim, Myeong Woo
    Jung, Hanwoong
    Nam, Heewoo
    Lim, Dongguen
    Lee, Seungwon
    Song, Joon-Ho
    Kwon, Suknam
    Hassoun, Joseph
    Lim, SukHwan
    Choi, Changkyu
    [J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 15 - 28
  • [8] Kim S., 2021, ICML
  • [9] NVIDIA, NVIDIA Deep Learning Accelerator.
  • [10] Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
    Stevens, Jacob R.
    Venkatesan, Rangharajan
    Dai, Steve
    Khailany, Brucek
    Raghunathan, Anand
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 469 - 474