Knowledge Distillation Approach for Efficient Internal Language Model Estimation

被引：0

作者：

Chen, Zhipeng ^{[1
]}

Xu, Haihua ^{[1
]}

Khassanov, Yerbolat ^{[1
]}

He, Yi ^{[1
]}

Lu, Lu ^{[1
]}

Ma, Zejun ^{[1
]}

Wu, Ji ^{[2
]}

机构：

[1] ByteDance, Beijing, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

ASR; language model; ILME; density ratio; knowledge distillation; efficiency; ASR;

D O I：

10.21437/Interspeech.2023-2479

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Internal language model estimation (ILME) has demonstrated its efficacy in domain adaptation for end-to-end (E2E) ASR. However, the performance improvement is achieved at the expense of computational cost, compared with conventional shallow fusion. To estimate the internal language model prior, one should run an extra forward operation on either ASR decoder or a separate density ratio (DR) language model (LM) for each decoding utterance. In this paper, we propose to employ knowledge distillation (KD) approach to realize efficient ILME for the Listen-Attend-Spell (LAS) E2E ASR model. First, we extensively explore diverse ILME and DR methods. We find that the ILM can be approximated with a DR-LM much smaller than the original ASR decoder. Furthermore, to reach the performance of ILME, we propose to employ the estimated ILM as teacher to teach a small DR-LM by KD. In this way, we achieve the best of both worlds: comparable performance to ILME and high efficiency of DR with a small DR-LM.

引用

页码：1339 / 1343

页数：5

共 50 条

[1] KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
Lee, Mun-Hak
Chang, Joon-Hyuk
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8392 - 8396
[2] A Task-Efficient Gradient Guide Knowledge Distillation for Pre-train Language Model Compression
Liu, Xu
Su, Yila
Wu, Nier
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 366 - 377
[3] An Efficient and Lightweight Approach for Intrusion Detection based on Knowledge Distillation
Zhao, Ruijie
Chen, Yu
Wang, Yijun
Shi, Yong
Xue, Zhi
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[4] Uncertainty-Driven Knowledge Distillation for Language Model Compression
Huang, Tianyu
Dong, Weisheng
Wu, Fangfang
Li, Xin
Shi, Guangming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2850 - 2858
[5] An Efficient Approach Using Knowledge Distillation Methods to Stabilize Performance in a Lightweight Top-Down Posture Estimation Network
Park, Changhyun
Lee, Hean Sung
Kim, Woo Jin
Bae, Han Byeol
Lee, Jaeho
Lee, Sangyoun
SENSORS, 2021, 21 (22)
[6] Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning
Malihi, Leila
Heidemann, Gunther
BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
[7] PanDa: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Zhong, Qihuang
Ding, Liang
Liu, Juhua
Du, Bo
Tao, Dacheng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4835 - 4848
[8] AN EFFICIENT METHOD FOR MODEL PRUNING USING KNOWLEDGE DISTILLATION WITH FEW SAMPLES
Zhou, ZhaoJing
Zhou, Yun
Jiang, Zhuqing
Men, Aidong
Wang, Haiying
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2515 - 2519
[9] Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation
Tang, Yinan
Guo, Zhenhua
Wang, Li
Fan, Baoyu
Cao, Fang
Gao, Kai
Zhang, Hongwei
Li, Rengang
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON EFFICIENT MULTIMEDIA COMPUTING UNDER LIMITED RESOURCES, EMCLR 2024, 2024, : 23 - 31
[10] Knowledge Distillation in Video-Based Human Action Recognition: An Intuitive Approach to Efficient and Flexible Model Training
Camarena, Fernando
Gonzalez-Mendoza, Miguel
Chang, Leonardo
JOURNAL OF IMAGING, 2024, 10 (04)

← 1 2 3 4 5 →