KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引:0
|
作者
Sun, Hao [1 ]
Tan, Xu [2 ]
Gan, Jun-Wei [3 ]
Zhao, Sheng [3 ]
Han, Dongxu [3 ]
Liu, Hongzhi [1 ]
Qin, Tao [2 ]
Liu, Tie-Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft STC Asia, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;
D O I
10.1109/asru46091.2019.9003918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [31] Boosting fine-tuning via Conditional Online Knowledge Transfer
    Liu, Zhiqiang
    Li, Yuhong
    Huang, Chengkai
    Luo, KunTing
    Liu, Yanxia
    NEURAL NETWORKS, 2024, 169 : 325 - 333
  • [32] On Effectiveness of Further Pre-training on BERT Models for Story Point Estimation
    Amasaki, Sousuke
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2023, 2023, : 49 - 53
  • [33] Research Paper Classification and Recommendation System based-on Fine-Tuning BERT
    Biswas, Dipto
    Gil, Joon-Min
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 295 - 296
  • [34] Incorporating Scenario Knowledge into A Unified Fine-tuning Architecture for Event Representation
    Zheng, Jianming
    Cai, Fei
    Chen, Honghui
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 249 - 258
  • [35] Short Answer Questions Generation by Fine-Tuning BERT and GPT-2
    Tsai, Danny C. L.
    Chang, Willy J. W.
    Yang, Stephen J. H.
    29TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION (ICCE 2021), VOL II, 2021, : 508 - 514
  • [36] A study on training fine-tuning of convolutional neural networks
    Cai, Zhicheng
    Peng, Chenglei
    2021 13TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST-2021), 2021, : 84 - 89
  • [37] Knowledge Graph Fusion for Language Model Fine-Tuning
    Bhana, Nimesh
    van Zyl, Terence L.
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 167 - 172
  • [38] The human-centric framework integrating knowledge distillation architecture with fine-tuning mechanism for equipment health monitoring
    Dang, Jr-Fong
    Chen, Tzu-Li
    Huang, Hung-Yi
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [39] One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
    Heo, Jungwoo
    Lim, Chan-yeong
    Kim, Ju-ho
    Shin, Hyun-seo
    Yu, Ha-Jin
    INTERSPEECH 2023, 2023, : 5271 - 5275
  • [40] TRAINING EARLY-EXIT ARCHITECTURES FOR AUTOMATIC SPEECH RECOGNITION: FINE-TUNING PRE-TRAINED MODELS OR TRAINING FROM SCRATCH
    Wright, George August
    Cappellazzo, Umberto
    Zaiem, Salah
    Raj, Desh
    Yang, Lucas Ondel
    Falavigna, Daniele
    Ali, Mohamed Nabih
    Brutti, Alessandro
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 685 - 689