Knowledge Distillation for In-Memory Keyword Spotting Model

被引:1
作者
Song, Zeyang [1 ]
Liu, Qi [2 ]
Yang, Qu [1 ]
Li, Haizhou [1 ,3 ,4 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[2] South China Univ Technol, Guangzhou, Peoples R China
[3] Chinese Univ Hong Kong, Shenzhen, Peoples R China
[4] Kriston AI, Shenzhen, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
Keyword spotting; knowledge distillation; in-memory computing; speech encoder; MFCC; SincConv;
D O I
10.21437/Interspeech.2022-633
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study a light-weight implementation of keyword spotting (KWS) for voice command and control, that can be implemented on an in-memory computing (IMC) unit with same accuracy at a lower computational cost than the state-of-the-art methods. KWS is expected to be always-on for mobile devices with limited resources. IMC represents one of the solutions. However, it only supports multiplication-accumulation and Boolean operations. We note that common feature extraction methods, such as MFCC and SincConv, are not supported by IMC as they depend on expensive logarithm computing. On the other hand, some neural network solutions to KWS involve a large number of parameters that are not feasible for mobile devices. In this work, we propose a knowledge distillation technique to replace the complex speech frontend like MFCC or SincConv with a light-weight encoder without performance loss. Experiments show that the proposed model outperforms the KWS model with MFCC and SincConv front-end in terms of accuracy and computational cost.
引用
收藏
页码:4128 / 4132
页数:5
相关论文
共 26 条
  • [1] [Anonymous], 2018, 2018 IEEE SPOKEN LAN, DOI [10.1109/SLT.2018.8639585., DOI 10.1109/SLT.2018.8639585]
  • [2] Baevski A., 2020, wav2vec 2.0: A Framework for SelfSupervised Learning of Speech Representations
  • [3] Berg A., 2021, ARXIV210400769
  • [4] Choi S., 2019, ARXIV190403814
  • [5] Choo Chang, 2015, Journal of Information and Communication Convergence Engineering, V13, P145, DOI 10.6109/jicce.2015.13.3.145
  • [6] Chung J, 2014, CORR, P1
  • [7] The parallel approach
    Di Ventra, Massimiliano
    Pershin, Yuriy V.
    [J]. NATURE PHYSICS, 2013, 9 (04) : 200 - 202
  • [8] Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition
    Ehkan, P.
    Zakaria, F. F.
    Warip, M. N. M.
    Sauli, Z.
    Elshaikh, M.
    [J]. ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315 : 471 - 480
  • [9] Gao Y., 2020, ARXIV201006676
  • [10] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]