Prototypical Knowledge Distillation for Noise Robust Keyword Spotting

被引:5
作者
Kim, Donghyeon [1 ]
Kim, Gwantae [1 ]
Lee, Bokyeung [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea
关键词
Keyword spotting; knowledge distillation; prototypical learning. features trained the;
D O I
10.1109/LSP.2022.3219358
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Keyword Spotting (KWS) is an essential component in contemporary audio-based deep learning systems and should be of minimal design when the system is working in streaming and on-device environments. We presented a robust feature extraction with a single-layer dynamic convolution model in our previous work. In this letter, we expand our earlier study into multi-layers of operation and propose a robust Knowledge Distillation (KD) learning method. Based on the distribution between class-centroids and embedding vectors, we compute three distinct distance metrics for the KD training and feature extraction processes. The results indicate that our KD method shows similar KWS performance over state-of-the-art models in terms of KWS but with low computational costs. Furthermore, our proposed method results in a more robust performance in noisy environments than conventional KD methods.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 31 条
[1]  
Berg A., 2021, arXiv
[2]   Temporal Convolution for Real-time Keyword Spotting on Mobile Devices [J].
Choi, Seungwoo ;
Seo, Seokjun ;
Shin, Beomjun ;
Byun, Hyeongmin ;
Kersner, Martin ;
Kim, Beomsu ;
Kim, Dongyoung ;
Ha, Sungjoo .
INTERSPEECH 2019, 2019, :3372-3376
[3]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[4]  
Gao YX, 2020, INT CONF ACOUST SPEE, P7479, DOI [10.1109/ICASSP40776.2020.9053313, 10.1109/icassp40776.2020.9053313]
[5]  
Heittola T., 2019, PROC WORKSHOP DETECT
[6]  
Hinton G, 2015, Arxiv, DOI arXiv:1503.02531
[7]  
Kim B, 2022, Arxiv, DOI arXiv:2106.04140
[8]  
Kim D, 2023, Arxiv, DOI arXiv:2109.11165
[9]  
Kim D, 2022, Arxiv, DOI arXiv:2205.01304
[10]   Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification [J].
Kim, Donghyeon ;
Park, Jaihyun ;
Han, David K. ;
Ko, Hanseok .
INTERSPEECH 2020, 2020, :836-840