Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization

被引：0

作者：

Yu, Chong ^{[1
]}

Chen, Tao ^{[2
]}

Gan, Zhongxue ^{[1
]}

机构：

[1] Fudan Univ, Acad Engn & Technol, Shanghai, Peoples R China

[2] Fudan Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Along with the performance improvement in NLP domain, the sizes of transformer-based language models (TLM) are also dramatically increased. Some prior works intend to compress TLM models into more compact forms, but do not fully consider the hardware characters may not support the efficient execution for these forms, leading to the deployment of TLM on hardware with noticeable acceleration is still challenging. This paper thoroughly designs a compression scheme named GPUSQ-TLM to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization characters. Especially, a dense TLM model is first pruned to meet the GPU's acceleration constraint of sparse patterns with FP16 type, then it is further quantized into a fixed-point one by quantization-aware training, to provide an extra speedup for integer tensors on GPU. A mixed-strategy knowledge distillation of labels, logits and feature maps is used for best accuracy compensation during pruning and quantization process. Experiment results show GPUSQ-TLM scheme achieves state-of-the-art compression on TLM model of various encoder and decoder blocks with negligible accuracy degradation on SQuAD, GLUE, CNN-DM & XSum and WikiText benchmarking tasks. Moreover, GPUSQ-TLM can boost actual deployment performance by up to 4.08-4.25x times latency and 6.18-6.79x throughput on A100 GPU.

引用

页码：218 / 235

页数：18

共 50 条

[21] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
Perez-Mayos, Laura
Taboas Garcia, Alba
Mille, Simon
Wanner, Leo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
[22] Reward modeling for mitigating toxicity in transformer-based language models
Farshid Faal
Ketra Schmitt
Jia Yuan Yu
Applied Intelligence, 2023, 53 : 8421 - 8435
[23] Reward modeling for mitigating toxicity in transformer-based language models
Faal, Farshid
Schmitt, Ketra
Yu, Jia Yuan
APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
[24] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
[25] Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval
Coelho, Joao
Neto, Antonio
Tavares, Miguel
Coutinho, Carlos
Oliveira, Joao
Ribeiro, Ricardo
Batista, Fernando
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 225 - 232
[26] Dynamic Low-rank Estimation for Transformer-based Language Models
Huai, Ting
Lie, Xiao
Gao, Shangqian
Hsu, Yenchang
Shen, Yilin
Jin, Hongxia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9275 - 9287
[27] Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jo, Jae-young
Myaeng, Sung-hyon
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3404 - 3417
[28] Pre-training and Evaluating Transformer-based Language Models for Icelandic
Daoason, Jon Friorik
Loftsson, Hrafn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
[29] Shared functional specialization in transformer-based language models and the human brain
Kumar, Sreejan
Sumers, Theodore R.
Yamakoshi, Takateru
Goldstein, Ariel
Hasson, Uri
Norman, Kenneth A.
Griffiths, Thomas L.
Hawkins, Robert D.
Nastase, Samuel A.
NATURE COMMUNICATIONS, 2024, 15 (01)
[30] Localizing in-domain adaptation of transformer-based biomedical language models
Buonocore, Tommaso Mario
Crema, Claudio
Redolfi, Alberto
Bellazzi, Riccardo
Parimbelli, Enea
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144

← 1 2 3 4 5 →