KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引:0
|
作者
Sun, Hao [1 ]
Tan, Xu [2 ]
Gan, Jun-Wei [3 ]
Zhao, Sheng [3 ]
Han, Dongxu [3 ]
Liu, Hongzhi [1 ]
Qin, Tao [2 ]
Liu, Tie-Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft STC Asia, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;
D O I
10.1109/asru46091.2019.9003918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [21] Fine-tuning long short-term memory models for seamless transition in hydrological modelling: From pre-training to post-application
    Chen, Xingtian
    Zhang, Yuhang
    Ye, Aizhong
    Li, Jinyang
    Hsu, Kuolin
    Sorooshian, Soroosh
    ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 186
  • [22] SelfCCL: Curriculum Contrastive Learning by Transferring Self-Taught Knowledge for Fine-Tuning BERT
    Dehghan, Somaiyeh
    Amasyali, Mehmet Fatih
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [23] KNOWLEDGE DISTILLATION INSPIRED FINE-TUNING OF TUCKER DECOMPOSED CNNs AND ADVERSARIAL ROBUSTNESS ANALYSIS
    Sadhukhan, Ranajoy
    Saha, Avinab
    Mukhopadhyay, Jayanta
    Patra, Amit
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1876 - 1880
  • [24] Fine-Tuning BERT Model for Materials Named Entity Recognition
    Zhao, Xintong
    Greenberg, Jane
    An, Yuan
    Hu, Xiaohua Tony
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3717 - 3720
  • [25] Research frontiers of pre-training mathematical models based on BERT
    Li, Guang
    Wang, Wennan
    Zhu, Liukai
    Peng, Jun
    Li, Xujia
    Luo, Ruijie
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
  • [26] POS-BERT: Point cloud one-stage BERT pre-training
    Fu, Kexue
    Gao, Peng
    Liu, Shaolei
    Qu, Linhao
    Gao, Longxiang
    Wang, Manning
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240
  • [27] A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
    Cai, Jie
    Zhu, Zhengzhou
    Nie, Ping
    Liu, Qian
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1665 - 1668
  • [28] An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification
    Bhopale, Amol P.
    Tiwari, Ashish
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 67 - 77
  • [29] Predicting Protein-DNA Binding Sites by Fine-Tuning BERT
    Zhang, Yue
    Chen, Yuehui
    Chen, Baitong
    Cao, Yi
    Chen, Jiazi
    Cong, Hanhan
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 663 - 669
  • [30] Fine-Tuning BERT on Coarse-Grained Labels: Exploring Hidden States for Fine-Grained Classification
    Anjum, Aftab
    Krestel, Ralf
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 1 - 15