Knowledge Distillation Approach for Efficient Internal Language Model Estimation

被引：0

作者：

Chen, Zhipeng ^{[1
]}

Xu, Haihua ^{[1
]}

Khassanov, Yerbolat ^{[1
]}

He, Yi ^{[1
]}

Lu, Lu ^{[1
]}

Ma, Zejun ^{[1
]}

Wu, Ji ^{[2
]}

机构：

[1] ByteDance, Beijing, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

ASR; language model; ILME; density ratio; knowledge distillation; efficiency; ASR;

D O I：

10.21437/Interspeech.2023-2479

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Internal language model estimation (ILME) has demonstrated its efficacy in domain adaptation for end-to-end (E2E) ASR. However, the performance improvement is achieved at the expense of computational cost, compared with conventional shallow fusion. To estimate the internal language model prior, one should run an extra forward operation on either ASR decoder or a separate density ratio (DR) language model (LM) for each decoding utterance. In this paper, we propose to employ knowledge distillation (KD) approach to realize efficient ILME for the Listen-Attend-Spell (LAS) E2E ASR model. First, we extensively explore diverse ILME and DR methods. We find that the ILM can be approximated with a DR-LM much smaller than the original ASR decoder. Furthermore, to reach the performance of ILME, we propose to employ the estimated ILM as teacher to teach a small DR-LM by KD. In this way, we achieve the best of both worlds: comparable performance to ILME and high efficiency of DR with a small DR-LM.

引用

页码：1339 / 1343

页数：5

共 50 条

[41] Using Kernel Density Estimation in Knowledge Distillation to Construct the Prediction Model for Bipolar Disorder Patients
Tseng, Yu-Shiang
Yang, Meng-Han
APPLIED SCIENCES-BASEL, 2023, 13 (18):
[42] Model Selection - Knowledge Distillation Framework for Model Compression
Chen, Renhai
Yuan, Shimin
Wang, Shaobo
Li, Zhenghan
Xing, Meng
Feng, Zhiyong
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[43] DUAL KNOWLEDGE DISTILLATION FOR EFFICIENT SOUND EVENT DETECTION
Xiao, Yang
Das, Rohan Kumar
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 690 - 694
[44] Effective Compression of Language Models by Combining Pruning and Knowledge Distillation
Chiu, Chi-Yu
Hong, Ding-Yong
Liu, Pangfeng
Wu, Jan-Jan
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 429 - 438
[45] Sparse Mixture of Experts Language Models Excel in Knowledge Distillation
Xu, Haiyang
Liu, Haoxiang
Gong, Wei
Wang, Hai
Deng, Xianjun
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 80 - 91
[46] Expert-level policy style measurement via knowledge distillation with large language model collaboration
Zhang, Yujie
Huang, Biao
Yuan, Weikang
Jiang, Zhuoren
Peng, Longsheng
Chen, Shuai
Tan-Soo, Jie-Sheng
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
[47] Triplet Knowledge Distillation Networks for Model Compression
Tang, Jialiang
Jiang, Ning
Yu, Wenxin
Wu, Wenqin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[48] Analysis of Model Compression Using Knowledge Distillation
Hong, Yu-Wei
Leu, Jenq-Shiou
Faisal, Muhamad
Prakosa, Setya Widyawan
IEEE ACCESS, 2022, 10 : 85095 - 85105
[49] Knowledge distillation for object detection with diffusion model
Zhang, Yi
Long, Junzong
Li, Chunrui
NEUROCOMPUTING, 2025, 636
[50] Reward estimation with scheduled knowledge distillation for dialogue policy learning
Qiu, Junyan
Zhang, Haidong
Yang, Yiping
CONNECTION SCIENCE, 2023, 35 (01)

← 1 2 3 4 5 →