KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

被引:0
|
作者
Sun, Hao [1 ]
Tan, Xu [2 ]
Gan, Jun-Wei [3 ]
Zhao, Sheng [3 ]
Han, Dongxu [3 ]
Liu, Hongzhi [1 ]
Qin, Tao [2 ]
Liu, Tie-Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
[3] Microsoft STC Asia, Beijing, Peoples R China
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Polyphone Disambiguation; Knowledge Distillation; Pre-training; Fine-tuning; BERT;
D O I
10.1109/asru46091.2019.9003918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Polyphone disambiguation aims to select the correct pronunciation for a polyphonic word from several candidates, which is important for text-to-speech synthesis. Since the pronunciation of a polyphonic word is usually decided by its context, polyphone disambiguation can be regarded as a language understanding task. Inspired by the success of BERT for language understanding, we propose to leverage pre-trained BERT models for polyphone disambiguation. However, BERT models are usually too heavy to be served online, in terms of both memory cost and inference speed. In this work, we focus on efficient model for polyphone disambiguation and propose a two-stage knowledge distillation method that transfers the knowledge from a heavy BERT model in both pre-training and fine-tuning stages to a lightweight BERT model, in order to reduce online serving cost. Experiments on Chinese and English polyphone disambiguation datasets demonstrate that our method reduces model parameters by a factor of 5 and improves inference speed by 7 times, while nearly matches the classification accuracy (95.4% on Chinese and 98.1% on English) to the original BERT model.
引用
收藏
页码:168 / 175
页数:8
相关论文
共 50 条
  • [41] Investigation of BERT Model on Biomedical Relation Extraction Based on Revised Fine-tuning Mechanism
    Su, Peng
    Vijay-Shanker, K.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2522 - 2529
  • [42] Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis
    Shah, Sahar
    Manzoni, Sara Lucia
    Zaman, Farooq
    Es Sabery, Fatima
    Epifania, Francesco
    Zoppis, Italo Francesco
    IEEE ACCESS, 2024, 12 : 104964 - 104982
  • [43] Compressing BERT for Binary Text Classification via Adaptive Truncation before Fine-Tuning
    Zhang, Xin
    Fan, Jing
    Hei, Mengzhe
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [44] A BERT Fine-tuning Model for Targeted Sentiment Analysis of Chinese Online Course Reviews
    Zhang, Huibing
    Dong, Junchao
    Min, Liang
    Bi, Peng
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (7-8)
  • [45] Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification
    Jiang, Yidi
    Sharma, Bidisha
    Madhavi, Maulik
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 4713 - 4717
  • [46] Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain
    El Boukkouri, Hicham
    Ferret, Olivier
    Lavergne, Thomas
    Zweigenbaum, Pierre
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2626 - 2633
  • [47] Trajectory-BERT: Trajectory Estimation Based on BERT Trajectory Pre-Training Model and Particle Filter Algorithm
    Wu, You
    Yu, Hongyi
    Du, Jianping
    Ge, Chenglong
    SENSORS, 2023, 23 (22)
  • [48] Prompt-Oriented Fine-Tuning Dual Bert for Aspect-Based Sentiment Analysis
    Yin, Wen
    Xu, Yi
    Liu, Cencen
    Zheng, Dezhang
    Wang, Qi
    Liu, Chuanjie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 505 - 517
  • [49] Pre-training Two BERT-Like Models for Moroccan Dialect: MorRoBERTa and MorrBERT
    Moussaoui O.
    El Younoussi Y.
    Mendel, 2023, 29 (01) : 55 - 61
  • [50] Emotion detection in psychological texts by fine-tuning BERT using emotion–cause pair extraction
    Kumar A.
    Jain A.K.
    International Journal of Speech Technology, 2022, 25 (03) : 727 - 743