Knowledge Distillation for In-Memory Keyword Spotting Model

被引：1

作者：

Song, Zeyang ^{[1
]}

Liu, Qi ^{[2
]}

Yang, Qu ^{[1
]}

Li, Haizhou ^{[1
,3
,4
]}

机构：

[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[2] South China Univ Technol, Guangzhou, Peoples R China

[3] Chinese Univ Hong Kong, Shenzhen, Peoples R China

[4] Kriston AI, Shenzhen, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

Keyword spotting; knowledge distillation; in-memory computing; speech encoder; MFCC; SincConv;

D O I：

10.21437/Interspeech.2022-633

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We study a light-weight implementation of keyword spotting (KWS) for voice command and control, that can be implemented on an in-memory computing (IMC) unit with same accuracy at a lower computational cost than the state-of-the-art methods. KWS is expected to be always-on for mobile devices with limited resources. IMC represents one of the solutions. However, it only supports multiplication-accumulation and Boolean operations. We note that common feature extraction methods, such as MFCC and SincConv, are not supported by IMC as they depend on expensive logarithm computing. On the other hand, some neural network solutions to KWS involve a large number of parameters that are not feasible for mobile devices. In this work, we propose a knowledge distillation technique to replace the complex speech frontend like MFCC or SincConv with a light-weight encoder without performance loss. Experiments show that the proposed model outperforms the KWS model with MFCC and SincConv front-end in terms of accuracy and computational cost.

引用

页码：4128 / 4132

页数：5

共 26 条

[1] [Anonymous], 2018, 2018 IEEE SPOKEN LAN, DOI [10.1109/SLT.2018.8639585., DOI 10.1109/SLT.2018.8639585]
[2] Baevski A., 2020, wav2vec 2.0: A Framework for SelfSupervised Learning of Speech Representations
[3] Berg A., 2021, ARXIV210400769
[4] Choi S., 2019, ARXIV190403814
[5] Choo Chang, 2015, Journal of Information and Communication Convergence Engineering, V13, P145, DOI 10.6109/jicce.2015.13.3.145
[6] Chung J, 2014, CORR, P1
[7] The parallel approach
Di Ventra, Massimiliano
Pershin, Yuriy V.
[J]. NATURE PHYSICS, 2013, 9 (04) : 200 - 202
[8] Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition
Ehkan, P.
Zakaria, F. F.
Warip, M. N. M.
Sauli, Z.
Elshaikh, M.
[J]. ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315 : 471 - 480
[9] Gao Y., 2020, ARXIV201006676
[10] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]

← 1 2 3 →