KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引：0

作者：

Sun, Hao ^{[1
]}

Tan, Xu ^{[2
]}

Gan, Jun-Wei ^{[3
]}

Zhao, Sheng ^{[3
]}

Han, Dongxu ^{[3
]}

Liu, Hongzhi ^{[1
]}

Qin, Tao ^{[2
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

[3] Microsoft STC Asia, Beijing, Peoples R China

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;

D O I：

10.1109/asru46091.2019.9003918

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.

引用

页码：168 / 175

页数：8

共 50 条

[21] Fine-tuning long short-term memory models for seamless transition in hydrological modelling: From pre-training to post-application
Chen, Xingtian
Zhang, Yuhang
Ye, Aizhong
Li, Jinyang
Hsu, Kuolin
Sorooshian, Soroosh
ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 186
[22] SelfCCL: Curriculum Contrastive Learning by Transferring Self-Taught Knowledge for Fine-Tuning BERT
Dehghan, Somaiyeh
Amasyali, Mehmet Fatih
APPLIED SCIENCES-BASEL, 2023, 13 (03):
[23] KNOWLEDGE DISTILLATION INSPIRED FINE-TUNING OF TUCKER DECOMPOSED CNNs AND ADVERSARIAL ROBUSTNESS ANALYSIS
Sadhukhan, Ranajoy
Saha, Avinab
Mukhopadhyay, Jayanta
Patra, Amit
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1876 - 1880
[24] Fine-Tuning BERT Model for Materials Named Entity Recognition
Zhao, Xintong
Greenberg, Jane
An, Yuan
Hu, Xiaohua Tony
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3717 - 3720
[25] Research frontiers of pre-training mathematical models based on BERT
Li, Guang
Wang, Wennan
Zhu, Liukai
Peng, Jun
Li, Xujia
Luo, Ruijie
2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
[26] POS-BERT: Point cloud one-stage BERT pre-training
Fu, Kexue
Gao, Peng
Liu, Shaolei
Qu, Linhao
Gao, Longxiang
Wang, Manning
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240
[27] A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Cai, Jie
Zhu, Zhengzhou
Nie, Ping
Liu, Qian
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1665 - 1668
[28] An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification
Bhopale, Amol P.
Tiwari, Ashish
MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 67 - 77
[29] Predicting Protein-DNA Binding Sites by Fine-Tuning BERT
Zhang, Yue
Chen, Yuehui
Chen, Baitong
Cao, Yi
Chen, Jiazi
Cong, Hanhan
INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 663 - 669
[30] Fine-Tuning BERT on Coarse-Grained Labels: Exploring Hidden States for Fine-Grained Classification
Anjum, Aftab
Krestel, Ralf
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 1 - 15

← 1 2 3 4 5 →