Knowledge Distillation Approach for Efficient Internal Language Model Estimation

被引:0
|
作者
Chen, Zhipeng [1 ]
Xu, Haihua [1 ]
Khassanov, Yerbolat [1 ]
He, Yi [1 ]
Lu, Lu [1 ]
Ma, Zejun [1 ]
Wu, Ji [2 ]
机构
[1] ByteDance, Beijing, Peoples R China
[2] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
INTERSPEECH 2023 | 2023年
关键词
ASR; language model; ILME; density ratio; knowledge distillation; efficiency; ASR;
D O I
10.21437/Interspeech.2023-2479
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Internal language model estimation (ILME) has demonstrated its efficacy in domain adaptation for end-to-end (E2E) ASR. However, the performance improvement is achieved at the expense of computational cost, compared with conventional shallow fusion. To estimate the internal language model prior, one should run an extra forward operation on either ASR decoder or a separate density ratio (DR) language model (LM) for each decoding utterance. In this paper, we propose to employ knowledge distillation (KD) approach to realize efficient ILME for the Listen-Attend-Spell (LAS) E2E ASR model. First, we extensively explore diverse ILME and DR methods. We find that the ILM can be approximated with a DR-LM much smaller than the original ASR decoder. Furthermore, to reach the performance of ILME, we propose to employ the estimated ILM as teacher to teach a small DR-LM by KD. In this way, we achieve the best of both worlds: comparable performance to ILME and high efficiency of DR with a small DR-LM.
引用
收藏
页码:1339 / 1343
页数:5
相关论文
共 50 条
  • [41] Using Kernel Density Estimation in Knowledge Distillation to Construct the Prediction Model for Bipolar Disorder Patients
    Tseng, Yu-Shiang
    Yang, Meng-Han
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [42] Model Selection - Knowledge Distillation Framework for Model Compression
    Chen, Renhai
    Yuan, Shimin
    Wang, Shaobo
    Li, Zhenghan
    Xing, Meng
    Feng, Zhiyong
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [43] DUAL KNOWLEDGE DISTILLATION FOR EFFICIENT SOUND EVENT DETECTION
    Xiao, Yang
    Das, Rohan Kumar
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 690 - 694
  • [44] Effective Compression of Language Models by Combining Pruning and Knowledge Distillation
    Chiu, Chi-Yu
    Hong, Ding-Yong
    Liu, Pangfeng
    Wu, Jan-Jan
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 429 - 438
  • [45] Sparse Mixture of Experts Language Models Excel in Knowledge Distillation
    Xu, Haiyang
    Liu, Haoxiang
    Gong, Wei
    Wang, Hai
    Deng, Xianjun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 80 - 91
  • [46] Expert-level policy style measurement via knowledge distillation with large language model collaboration
    Zhang, Yujie
    Huang, Biao
    Yuan, Weikang
    Jiang, Zhuoren
    Peng, Longsheng
    Chen, Shuai
    Tan-Soo, Jie-Sheng
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
  • [47] Triplet Knowledge Distillation Networks for Model Compression
    Tang, Jialiang
    Jiang, Ning
    Yu, Wenxin
    Wu, Wenqin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [48] Analysis of Model Compression Using Knowledge Distillation
    Hong, Yu-Wei
    Leu, Jenq-Shiou
    Faisal, Muhamad
    Prakosa, Setya Widyawan
    IEEE ACCESS, 2022, 10 : 85095 - 85105
  • [49] Knowledge distillation for object detection with diffusion model
    Zhang, Yi
    Long, Junzong
    Li, Chunrui
    NEUROCOMPUTING, 2025, 636
  • [50] Reward estimation with scheduled knowledge distillation for dialogue policy learning
    Qiu, Junyan
    Zhang, Haidong
    Yang, Yiping
    CONNECTION SCIENCE, 2023, 35 (01)