KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引：0

作者：

Sun, Hao ^{[1
]}

Tan, Xu ^{[2
]}

Gan, Jun-Wei ^{[3
]}

Zhao, Sheng ^{[3
]}

Han, Dongxu ^{[3
]}

Liu, Hongzhi ^{[1
]}

Qin, Tao ^{[2
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

[3] Microsoft STC Asia, Beijing, Peoples R China

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;

D O I：

10.1109/asru46091.2019.9003918

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.

引用

页码：168 / 175

页数：8

共 50 条

[1] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Peng Su
K. Vijay-Shanker
BMC Bioinformatics, 23
[2] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
Su, Peng
Vijay-Shanker, K.
BMC BIOINFORMATICS, 2022, 23 (01)
[3] Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition
Wang, Guanhong
Zhou, Yang
He, Zhanhao
Lu, Keyu
Feng, Yang
Liu, Zuozhu
Wang, Gaoang
NEUROCOMPUTING, 2024, 571
[4] Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking
Li, Lu
Wang, Qipeng
Zhao, Baohua
Li, Xinwei
Zhou, Aihua
Wu, Hanqian
ELECTRONICS, 2022, 11 (14)
[5] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Xu, Yi-Ge
Qiu, Xi-Peng
Zhou, Li-Gao
Huang, Xuan-Jing
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (04) : 853 - 866
[6] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yi-Ge Xu
Xi-Peng Qiu
Li-Gao Zhou
Xuan-Jing Huang
Journal of Computer Science and Technology, 2023, 38 : 853 - 866
[7] From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
Bonfigli, Agnese
Bacco, Luca
Merone, Mario
Dell'Orletta, Felice
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157
[8] Pre-training using pseudo images and fine-tuning using real images for nighttime traffic Sign Detection
Yamamoto M.
Ohashi G.
IEEJ Transactions on Electronics, Information and Systems, 2021, 141 (09) : 969 - 976
[9] Chinese Medical Named Entity Recognition based on Expert Knowledge and Fine-tuning Bert
Zhang, Bofeng
Yao, Xiuhong
Li, Haiyan
Aini, Mirensha
2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 84 - 90
[10] Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation
Zhang, Chong
Wang, Hong-Zhi
Liu, Hong-Wei
Chen, Yi-Lin
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (06) : 1238 - 1247

← 1 2 3 4 5 →