Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization

被引:0
|
作者
Yu, Chong [1 ]
Chen, Tao [2 ]
Gan, Zhongxue [1 ]
机构
[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China
[2] Fudan Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Along with the performance improvement in NLP domain, the sizes of transformer-based language models (TLM) are also dramatically increased. Some prior works intend to compress TLM models into more compact forms, but do not fully consider the hardware characters may not support the efficient execution for these forms, leading to the deployment of TLM on hardware with noticeable acceleration is still challenging. This paper thoroughly designs a compression scheme named GPUSQ-TLM to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization characters. Especially, a dense TLM model is first pruned to meet the GPU's acceleration constraint of sparse patterns with FP16 type, then it is further quantized into a fixed-point one by quantization-aware training, to provide an extra speedup for integer tensors on GPU. A mixed-strategy knowledge distillation of labels, logits and feature maps is used for best accuracy compensation during pruning and quantization process. Experiment results show GPUSQ-TLM scheme achieves state-of-the-art compression on TLM model of various encoder and decoder blocks with negligible accuracy degradation on SQuAD, GLUE, CNN-DM & XSum and WikiText benchmarking tasks. Moreover, GPUSQ-TLM can boost actual deployment performance by up to 4.08-4.25x times latency and 6.18-6.79x throughput on A100 GPU.
引用
收藏
页码:218 / 235
页数:18
相关论文
共 50 条
  • [41] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
    Ganiev, Amir
    Chapin, Colt
    de Andrade, Anderson
    Liu, Chen
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
  • [42] Influence of Language Proficiency on the Readability of Review Text and Transformer-based Models for Determining Language Proficiency
    Sazzed, Salim
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 881 - 886
  • [43] Bringing order into the realm of Transformer-based language models for artificial intelligence and law
    Greco, Candida M.
    Tagarelli, Andrea
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 863 - 1010
  • [44] Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
    Aspillaga, Carlos
    Carvallo, Andres
    Araujo, Vladimir
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1882 - 1894
  • [45] Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models
    Shiju, Akhil
    He, Zhe
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 163 - 169
  • [46] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Feihong Yang
    Xuwen Wang
    Hetong Ma
    Jiao Li
    BMC Medical Informatics and Decision Making, 21
  • [47] Catching but a glimpse?-Navigating crowdsourced solution spaces with transformer-based language models
    Just, Julian
    Hutter, Katja
    Fueller, Johann
    CREATIVITY AND INNOVATION MANAGEMENT, 2024, 33 (04) : 718 - 741
  • [48] No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
    Kaddour, Jean
    Key, Oscar
    Nawrot, Piotr
    Minervini, Pasquale
    Kusner, Matt J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Yang, Feihong
    Wang, Xuwen
    Ma, Hetong
    Li, Jiao
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
  • [50] Transformer-Based Music Language Modelling and Transcription
    Zonios, Christos
    Pavlopoulos, John
    Likas, Aristidis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,