Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

被引:0
|
作者
Neill, James O' [1 ]
Dutta, Sourav [1 ]
机构
[1] Huawei Ireland Res Ctr, Townsend St, Dublin 2, Ireland
来源
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R-Base and InfoXLM(Base) and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.
引用
收藏
页码:1329 / 1339
页数:11
相关论文
共 50 条
  • [1] TransMRSR: transformer-based self-distilled generative prior for brain MRI super-resolution
    Huang, Shan
    Liu, Xiaohong
    Tan, Tao
    Hu, Menghan
    Wei, Xiaoer
    Chen, Tingli
    Sheng, Bin
    VISUAL COMPUTER, 2023, 39 (08): : 3647 - 3659
  • [2] TransMRSR: transformer-based self-distilled generative prior for brain MRI super-resolution
    Shan Huang
    Xiaohong Liu
    Tao Tan
    Menghan Hu
    Xiaoer Wei
    Tingli Chen
    Bin Sheng
    The Visual Computer, 2023, 39 : 3647 - 3659
  • [3] Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization
    Yu, Chong
    Chen, Tao
    Gan, Zhongxue
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 218 - 235
  • [4] SELF-DISTILLED DYNAMIC FUSION NETWORK FOR LANGUAGE-BASED FASHION RETRIEVAL
    Wu, Yiming
    Li, Hangfei
    Wang, Fangfang
    Zhang, Yilong
    Liang, Ronghua
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3260 - 3264
  • [5] Blockwise compression of transformer-based models without retraining
    Dong, Gaochen
    Chen, W.
    NEURAL NETWORKS, 2024, 171 : 423 - 428
  • [6] Ouroboros: On Accelerating Training of Transformer-Based Language Models
    Yang, Qian
    Huo, Zhouyuan
    Wang, Wenlin
    Huang, Heng
    Carin, Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Transformer-Based Language Models for Software Vulnerability Detection
    Thapa, Chandra
    Jang, Seung Ick
    Ahmed, Muhammad Ejaz
    Camtepe, Seyit
    Pieprzyk, Josef
    Nepal, Surya
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
  • [8] A Comparison of Transformer-Based Language Models on NLP Benchmarks
    Greco, Candida Maria
    Tagarelli, Andrea
    Zumpano, Ester
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
  • [9] RadBERT: Adapting Transformer-based Language Models to Radiology
    Yan, An
    McAuley, Julian
    Lu, Xing
    Du, Jiang
    Chang, Eric Y.
    Gentili, Amilcare
    Hsu, Chun-Nan
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
  • [10] Applications of transformer-based language models in bioinformatics: a survey
    Zhang, Shuang
    Fan, Rui
    Liu, Yuti
    Chen, Shuang
    Liu, Qiao
    Zeng, Wanwen
    NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)