Automatic Mixed-Precision Quantization Search of BERT

被引：0

作者：

Zhao, Changsheng ^{[1
]}

Hua, Ting ^{[1
]}

Shen, Yilin ^{[1
]}

Lou, Qian ^{[1
]}

Jin, Hongxia ^{[1
]}

机构：

[1] Samsung Res Amer, Mountain View, CA 94043 USA

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few quantization attempts that are specifically designed for natural language processing tasks. They suffer from a small compression ratio or a large error rate since manual setting on hyper-parameters is required and fine-grained subgroup-wise quantization is not supported. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each subgroup automatically, and at the same time pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method outperforms baselines by providing the same performance with much smaller model size. We also show the feasibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.

引用

页码：3427 / 3433

页数：7

共 50 条

[1] AMED: Automatic Mixed-Precision Quantization for Edge Devices
Kimhi, Moshe
Rozen, Tal
Mendelson, Avi
Baskin, Chaim
MATHEMATICS, 2024, 12 (12)
[2] AutoMPQ: Automatic Mixed-Precision Neural Network Search via Few-Shot Quantization Adapter
Xu, Ke
Shao, Xiangyang
Tian, Ye
Yang, Shangshang
Zhang, Xingyi
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1 - 13
[3] Exploration of Automatic Mixed-Precision Search for Deep Neural Networks
Guo, Xuyang
Huang, Yuanjun
Cheng, Hsin-pai
Li, Bing
Wen, Wei
Ma, Siyuan
Li, Hai
Chen, Yiran
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 276 - 278
[4] A Novel Mixed-Precision Quantization Approach for CNNs
Wu, Dan
Wang, Yanzhi
Fei, Yuqi
Gao, Guowang
IEEE ACCESS, 2025, 13 : 49309 - 49319
[5] EVOLUTIONARY QUANTIZATION OF NEURAL NETWORKS WITH MIXED-PRECISION
Liu, Zhenhua
Zhang, Xinfeng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2785 - 2789
[6] Deployable mixed-precision quantization with co-learning and one-time search
Wang, Shiguang
Zhang, Zhongyu
Ai, Guo
Cheng, Jian
NEURAL NETWORKS, 2025, 181
[7] Hardware-Centric AutoML for Mixed-Precision Quantization
Wang, Kuan
Liu, Zhijian
Lin, Yujun
Lin, Ji
Han, Song
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2035 - 2048
[8] Mixed-Precision Collaborative Quantization for Fast Object Tracking
Xie, Yefan
Guo, Yanwei
Hou, Xuan
Zheng, Jiangbin
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 229 - 238
[9] One-Shot Model for Mixed-Precision Quantization
Koryakovskiy, Ivan
Yakovleva, Alexandra
Buchnev, Valentin
Isaev, Temur
Odinokikh, Gleb
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7939 - 7949
[10] CSMPQ: Class Separability Based Mixed-Precision Quantization
Wang, Mingkai
Jin, Taisong
Zhang, Miaohui
Yu, Zhengtao
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT I, 2023, 14086 : 544 - 555

← 1 2 3 4 5 →