KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引:0
|
作者
Sun, Hao [1 ]
Tan, Xu [2 ]
Gan, Jun-Wei [3 ]
Zhao, Sheng [3 ]
Han, Dongxu [3 ]
Liu, Hongzhi [1 ]
Qin, Tao [2 ]
Liu, Tie-Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft STC Asia, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;
D O I
10.1109/asru46091.2019.9003918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [1] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
    Peng Su
    K. Vijay-Shanker
    BMC Bioinformatics, 23
  • [2] Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction
    Su, Peng
    Vijay-Shanker, K.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [3] Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition
    Wang, Guanhong
    Zhou, Yang
    He, Zhanhao
    Lu, Keyu
    Feng, Yang
    Liu, Zuozhu
    Wang, Gaoang
    NEUROCOMPUTING, 2024, 571
  • [4] Pre-Training and Fine-Tuning with Next Sentence Prediction for Multimodal Entity Linking
    Li, Lu
    Wang, Qipeng
    Zhao, Baohua
    Li, Xinwei
    Zhou, Aihua
    Wu, Hanqian
    ELECTRONICS, 2022, 11 (14)
  • [5] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
    Xu, Yi-Ge
    Qiu, Xi-Peng
    Zhou, Li-Gao
    Huang, Xuan-Jing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (04) : 853 - 866
  • [6] Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
    Yi-Ge Xu
    Xi-Peng Qiu
    Li-Gao Zhou
    Xuan-Jing Huang
    Journal of Computer Science and Technology, 2023, 38 : 853 - 866
  • [7] From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
    Bonfigli, Agnese
    Bacco, Luca
    Merone, Mario
    Dell'Orletta, Felice
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157
  • [8] Pre-training using pseudo images and fine-tuning using real images for nighttime traffic Sign Detection
    Yamamoto M.
    Ohashi G.
    IEEJ Transactions on Electronics, Information and Systems, 2021, 141 (09) : 969 - 976
  • [9] Chinese Medical Named Entity Recognition based on Expert Knowledge and Fine-tuning Bert
    Zhang, Bofeng
    Yao, Xiuhong
    Li, Haiyan
    Aini, Mirensha
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 84 - 90
  • [10] Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation
    Zhang, Chong
    Wang, Hong-Zhi
    Liu, Hong-Wei
    Chen, Yi-Lin
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (06) : 1238 - 1247