Knowledge Distillation Approach for Efficient Internal Language Model Estimation

被引：0

作者：

Chen, Zhipeng ^{[1
]}

Xu, Haihua ^{[1
]}

Khassanov, Yerbolat ^{[1
]}

He, Yi ^{[1
]}

Lu, Lu ^{[1
]}

Ma, Zejun ^{[1
]}

Wu, Ji ^{[2
]}

机构：

[1] ByteDance, Beijing, Peoples R China

[2] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

关键词：

ASR; language model; ILME; density ratio; knowledge distillation; efficiency; ASR;

D O I：

10.21437/Interspeech.2023-2479

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Internal language model estimation (ILME) has demonstrated its efficacy in domain adaptation for end-to-end (E2E) ASR. However, the performance improvement is achieved at the expense of computational cost, compared with conventional shallow fusion. To estimate the internal language model prior, one should run an extra forward operation on either ASR decoder or a separate density ratio (DR) language model (LM) for each decoding utterance. In this paper, we propose to employ knowledge distillation (KD) approach to realize efficient ILME for the Listen-Attend-Spell (LAS) E2E ASR model. First, we extensively explore diverse ILME and DR methods. We find that the ILM can be approximated with a DR-LM much smaller than the original ASR decoder. Furthermore, to reach the performance of ILME, we propose to employ the estimated ILM as teacher to teach a small DR-LM by KD. In this way, we achieve the best of both worlds: comparable performance to ILME and high efficiency of DR with a small DR-LM.

引用

页码：1339 / 1343

页数：5

共 50 条

[31] A Novel Multi-Knowledge Distillation Approach
Li, Lianqiang
Sun, Kangbo
Zhu, Jie
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01) : 216 - 219
[32] Knowledge distillation approach towards melanoma detection
Khan, Md Shakib
Alam, Kazi Nabiul
Dhruba, Abdur Rab
Zunair, Hasib
Mohammed, Nabeel
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
[33] Resolution-Aware Knowledge Distillation for Efficient Inference
Feng, Zhanxiang
Lai, Jianhuang
Xie, Xiaohua
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6985 - 6996
[34] Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices
Cantini, Riccardo
Orsino, Alessio
Talia, Domenico
JOURNAL OF BIG DATA, 2024, 11 (01)
[35] TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Liu, Ruiping
Yang, Kailun
Roitberg, Alina
Zhang, Jiaming
Peng, Kunyu
Liu, Huayao
Wang, Yaonan
Stiefelhagen, Rainer
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 20933 - 20949
[36] Efficient Water Segmentation with Transformer and Knowledge Distillation for USVs
Zhang, Jingting
Gao, Jiantao
Liang, Jinshuo
Wu, Yiqiang
Li, Bin
Zhai, Yang
Li, Xiaomao
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (05)
[37] AN EFFICIENT PERSON REID METHED BASED ON KNOWLEDGE DISTILLATION
Peng, Liang
Kuang, Ping
Li, Fan
Gu, Xiaofeng
2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 13 - 16
[38] Structured Knowledge Distillation for Accurate and Efficient Object Detection
Zhang, Linfeng
Ma, Kaisheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15706 - 15724
[39] Efficient Medical Image Segmentation Based on Knowledge Distillation
Qin, Dian
Bu, Jia-Jun
Liu, Zhe
Shen, Xin
Zhou, Sheng
Gu, Jing-Jun
Wang, Zhi-Hua
Wu, Lei
Dai, Hui-Fen
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (12) : 3820 - 3831
[40] EFFICIENT KNOWLEDGE DISTILLATION FOR RNN-TRANSDUCER MODELS
Panchapagesan, Sankaran
Park, Daniel S.
Chiu, Chung-Cheng
Yuan Shangguan
Qiao Liang
Gruenstein, Alexander
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5639 - 5643

← 1 2 3 4 5 →