Prototypical Knowledge Distillation for Noise Robust Keyword Spotting

被引：5

作者：

Kim, Donghyeon ^{[1
]}

Kim, Gwantae ^{[1
]}

Lee, Bokyeung ^{[1
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

关键词：

Keyword spotting; knowledge distillation; prototypical learning. features trained the;

D O I：

10.1109/LSP.2022.3219358

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Keyword Spotting (KWS) is an essential component in contemporary audio-based deep learning systems and should be of minimal design when the system is working in streaming and on-device environments. We presented a robust feature extraction with a single-layer dynamic convolution model in our previous work. In this letter, we expand our earlier study into multi-layers of operation and propose a robust Knowledge Distillation (KD) learning method. Based on the distribution between class-centroids and embedding vectors, we compute three distinct distance metrics for the KD training and feature extraction processes. The results indicate that our KD method shows similar KWS performance over state-of-the-art models in terms of KWS but with low computational costs. Furthermore, our proposed method results in a more robust performance in noisy environments than conventional KD methods.

引用

页码：2298 / 2302

页数：5

共 31 条

[1]

Berg A., 2021, arXiv

[2] Temporal Convolution for Real-time Keyword Spotting on Mobile Devices [J].

Choi, Seungwoo ;

Seo, Seokjun ;

Shin, Beomjun ;

Byun, Hyeongmin ;

Kersner, Martin ;

Kim, Beomsu ;

Kim, Dongyoung ;

Ha, Sungjoo .

INTERSPEECH 2019, 2019, :3372-3376

[3]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[4]

Gao YX, 2020, INT CONF ACOUST SPEE, P7479, DOI [10.1109/ICASSP40776.2020.9053313, 10.1109/icassp40776.2020.9053313]

[5]

Heittola T., 2019, PROC WORKSHOP DETECT

[6]

Hinton G, 2015, Arxiv, DOI arXiv:1503.02531

[7]

Kim B, 2022, Arxiv, DOI arXiv:2106.04140

[8]

Kim D, 2023, Arxiv, DOI arXiv:2109.11165

[9]

Kim D, 2022, Arxiv, DOI arXiv:2205.01304

[10] Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification [J].

Kim, Donghyeon ;

Park, Jaihyun ;

Han, David K. ;

Ko, Hanseok .

INTERSPEECH 2020, 2020, :836-840

← 1 2 3 4 →