SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference

被引:4
作者
Wang, Wenxun [1 ]
Zhou, Shuchang [2 ]
Sun, Wenyu [1 ]
Sun, Peiqin [2 ]
Liu, Yongpan [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] MEGVII Technol, Beijing, Peoples R China
来源
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD | 2023年
关键词
Transformers; neural networks; hardware-software co-design; softmax; layer normalization;
D O I
10.1109/ICCAD57390.2023.10323725
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Transformers have shown remarkable performance in both natural language processing (NLP) and computer vision (CV) tasks. However, their real-time inference speed and efficiency are limited due to the inefficiency in Softmax and Layer Normalization (LayerNorm). Previous works based on function approximation suffer from inefficient implementation as they place emphasis on computation while disregarding memory overhead concerns. Moreover, such methods rely on retraining to compensate for approximation error which can be costly and inconvenient. In this paper, we present SOLE, a hardware-software co-design for Softmax and LayerNorm which is composed of E2Softmax and AILayerNorm. E2Softmax utilizes log2 quantization of exponent function and log-based division to approximate Softmax while AILayerNorm adopts low-precision statistic calculation. Compared with state-of-the-art designs, we achieve both low-precision calculation and low bit-width storage on Softmax and LayerNorm. Experiments show that SOLE maintains inference accuracy without retraining while offering orders of magnitude speedup and energy savings over GPU, achieving 3.04x, 3.86x energy-efficiency improvements and 2.82x, 3.32x area-efficiency improvements over prior state-of-the-art custom hardware for Softmax and LayerNorm, respectively.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] A Scalable Hardware/Software Co-design Approach for Efficient Polynomial Multiplication
    Meszlenyi, Lorant
    Kavun, Elif Bilge
    Keskinkurt-Paksoy, Irem
    Khalid, Ayesha
    Yalcin, Tolga
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [32] Optimized Hardware-Software Co-Design for Kyber and Dilithium on RISC-V SoC FPGA
    Wang, Tengfei
    Zhang, Chi
    Zhang, Xiaolin
    Gu, Dawu
    Cao, Pei
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (03): : 99 - 135
  • [33] Hardware-Software Co-design of QRD-RLS Algorithm with Microblaze Soft Core Processor
    Lodha, Nupur
    Rai, Nivesh
    Dubey, Rahul
    Venkataraman, Hrishikesh
    INFORMATION SYSTEMS, TECHNOLOGY AND MANAGEMENT-THIRD INTERNATIONAL CONFERENCE, ICISTM 2009, 2009, 31 : 197 - 207
  • [34] Coding and cryptography for resource constrained wireless sensor networks: A hardware-software co-design approach
    Popovici, Emanuel M.
    2006 INTERNATIONAL SEMICONDUCTOR CONFERENCE, VOLS 1 AND 2, 2007, : 19 - +
  • [35] Hardware/software co-design of control algorithms
    Petko, Maciej
    Karpiel, Grzegorz
    IEEE ICMA 2006: PROCEEDING OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2006, : 2156 - +
  • [36] GPGPU: HARDWARE/SOFTWARE CO-DESIGN FOR THE MASSES
    Mann, Zoltan Adam
    COMPUTING AND INFORMATICS, 2011, 30 (06) : 1247 - 1257
  • [37] An Efficient Architecture for a TCP Offload Engine Based on Hardware/Software Co-design
    Jang, Hankook
    Chung, Sang-Hwa
    Kim, Dung Kyue
    Lee, Yun-Sung
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (02) : 493 - 509
  • [38] A Hardware-Software Co-Design For A Real-Time Spectral Subtraction Based Noise Cancellation System
    Adiono, Trio
    Purwita, Ardimas Andi
    Haryadi, Ricky
    Mareta, Rella
    Priandana, Eka Rakhman
    2013 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS), 2013, : 5 - 10
  • [39] The design of a rapid prototyping platform for hardware/software co-design
    Wu, BF
    Peng, CL
    Sun, XG
    CAD/GRAPHICS '2001: PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN AND COMPUTER GRAPHICS, VOLS 1 AND 2, 2001, : 931 - 934
  • [40] Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design
    Talati, Nishil
    May, Kyle
    Behroozi, Armand
    Yang, Yichen
    Kaszyk, Kuba
    Vasiladiotis, Christos
    Verma, Tarunesh
    Li, Lu
    Nguyen, Brandon
    Sun, Jiawen
    Morton, John Magnus
    Ahmadi, Agreen
    Austin, Todd
    O'Boyle, Michael
    Mahlke, Scott
    Mudge, Trevor
    Dreslinski, Ronald
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 654 - 667