Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization

被引：0

作者：

Yu, Chong ^{[1
]}

Chen, Tao ^{[2
]}

Gan, Zhongxue ^{[1
]}

机构：

[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China

[2] Fudan Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Along with the performance improvement in NLP domain, the sizes of transformer-based language models (TLM) are also dramatically increased. Some prior works intend to compress TLM models into more compact forms, but do not fully consider the hardware characters may not support the efficient execution for these forms, leading to the deployment of TLM on hardware with noticeable acceleration is still challenging. This paper thoroughly designs a compression scheme named GPUSQ-TLM to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization characters. Especially, a dense TLM model is first pruned to meet the GPU's acceleration constraint of sparse patterns with FP16 type, then it is further quantized into a fixed-point one by quantization-aware training, to provide an extra speedup for integer tensors on GPU. A mixed-strategy knowledge distillation of labels, logits and feature maps is used for best accuracy compensation during pruning and quantization process. Experiment results show GPUSQ-TLM scheme achieves state-of-the-art compression on TLM model of various encoder and decoder blocks with negligible accuracy degradation on SQuAD, GLUE, CNN-DM & XSum and WikiText benchmarking tasks. Moreover, GPUSQ-TLM can boost actual deployment performance by up to 4.08-4.25x times latency and 6.18-6.79x throughput on A100 GPU.

引用

页码：218 / 235

页数：18

共 50 条

[41] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
Ganiev, Amir
Chapin, Colt
de Andrade, Anderson
Liu, Chen
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
[42] Influence of Language Proficiency on the Readability of Review Text and Transformer-based Models for Determining Language Proficiency
Sazzed, Salim
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 881 - 886
[43] Bringing order into the realm of Transformer-based language models for artificial intelligence and law
Greco, Candida M.
Tagarelli, Andrea
ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 863 - 1010
[44] Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Aspillaga, Carlos
Carvallo, Andres
Araujo, Vladimir
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1882 - 1894
[45] Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models
Shiju, Akhil
He, Zhe
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 163 - 169
[46] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
Feihong Yang
Xuwen Wang
Hetong Ma
Jiao Li
BMC Medical Informatics and Decision Making, 21
[47] Catching but a glimpse?-Navigating crowdsourced solution spaces with transformer-based language models
Just, Julian
Hutter, Katja
Fueller, Johann
CREATIVITY AND INNOVATION MANAGEMENT, 2024, 33 (04) : 718 - 741
[48] No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Kaddour, Jean
Key, Oscar
Nawrot, Piotr
Minervini, Pasquale
Kusner, Matt J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
Yang, Feihong
Wang, Xuwen
Ma, Hetong
Li, Jiao
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
[50] Transformer-Based Music Language Modelling and Transcription
Zonios, Christos
Pavlopoulos, John
Likas, Aristidis
PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,

← 1 2 3 4 5 →