Automatic Mixed-Precision Quantization Search of BERT

被引:0
|
作者
Zhao, Changsheng [1 ]
Hua, Ting [1 ]
Shen, Yilin [1 ]
Lou, Qian [1 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few quantization attempts that are specifically designed for natural language processing tasks. They suffer from a small compression ratio or a large error rate since manual setting on hyper-parameters is required and fine-grained subgroup-wise quantization is not supported. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each subgroup automatically, and at the same time pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method outperforms baselines by providing the same performance with much smaller model size. We also show the feasibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.
引用
收藏
页码:3427 / 3433
页数:7
相关论文
共 50 条
  • [1] AMED: Automatic Mixed-Precision Quantization for Edge Devices
    Kimhi, Moshe
    Rozen, Tal
    Mendelson, Avi
    Baskin, Chaim
    MATHEMATICS, 2024, 12 (12)
  • [2] AutoMPQ: Automatic Mixed-Precision Neural Network Search via Few-Shot Quantization Adapter
    Xu, Ke
    Shao, Xiangyang
    Tian, Ye
    Yang, Shangshang
    Zhang, Xingyi
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
  • [3] Exploration of Automatic Mixed-Precision Search for Deep Neural Networks
    Guo, Xuyang
    Huang, Yuanjun
    Cheng, Hsin-pai
    Li, Bing
    Wen, Wei
    Ma, Siyuan
    Li, Hai
    Chen, Yiran
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 276 - 278
  • [4] A Novel Mixed-Precision Quantization Approach for CNNs
    Wu, Dan
    Wang, Yanzhi
    Fei, Yuqi
    Gao, Guowang
    IEEE ACCESS, 2025, 13 : 49309 - 49319
  • [5] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
    Liu, Zhenhua
    Zhang, Xinfeng
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
  • [6] Deployable mixed-precision quantization with co-learning and one-time search
    Wang, Shiguang
    Zhang, Zhongyu
    Ai, Guo
    Cheng, Jian
    NEURAL NETWORKS, 2025, 181
  • [7] Hardware-Centric AutoML for Mixed-Precision Quantization
    Wang, Kuan
    Liu, Zhijian
    Lin, Yujun
    Lin, Ji
    Han, Song
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
  • [8] Mixed-Precision Collaborative Quantization for Fast Object Tracking
    Xie, Yefan
    Guo, Yanwei
    Hou, Xuan
    Zheng, Jiangbin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238
  • [9] One-Shot Model for Mixed-Precision Quantization
    Koryakovskiy, Ivan
    Yakovleva, Alexandra
    Buchnev, Valentin
    Isaev, Temur
    Odinokikh, Gleb
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
  • [10] CSMPQ: Class Separability Based Mixed-Precision Quantization
    Wang, Mingkai
    Jin, Taisong
    Zhang, Miaohui
    Yu, Zhengtao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555