Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness

被引:3
|
作者
Ng, Dianwen [1 ,2 ]
Xiao, Yang [2 ]
Yip, Jia Qi [1 ,2 ]
Yang, Zhao [2 ]
Tian, Biao [1 ]
Fu, Qiang [1 ]
Chng, Eng Siong [2 ]
Ma, Bin [1 ]
机构
[1] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
关键词
Small Footprint; Keyword Spotting; Multichannel; Noisy Far-field; Centroid Awareness;
D O I
10.21437/Interspeech.2023-1210
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for small-footprint models, given the restrictions on computational resources (e.g., model size, running memory). This is even more intricate when handling noises from multiple microphones. To address this, we present a new multi-channel model that uses a CNN-based network with a linear mixing unit to achieve local-global dependency representations. Our method enhances noise-robustness while ensuring more efficient computation. Besides, we propose an end-to-end centroid-based awareness module that provides class similarity awareness at the bottleneck level to correct ambiguous cases during prediction. We conducted experiments using real noisy far-field data from the MISP challenge 2021 and achieved SOTA results compared to existing small-footprint KWS models. Our best score of 0.126 is highly competitive against larger models like 3D-ResNet, which is 0.122, but ours is much smaller at 473K compared to 13M.
引用
收藏
页码:296 / 300
页数:5
相关论文
共 50 条
  • [21] Speech densely connected convolutional networks for small-footprint keyword spotting
    Tsung-Han Tsai
    Xin-Hui Lin
    Multimedia Tools and Applications, 2023, 82 : 39119 - 39137
  • [22] Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
    Xu, Menglong
    Li, Shengqiang
    Liang, Chengdong
    Zhang, Xiao-Lei
    INTERSPEECH 2022, 2022, : 3278 - 3282
  • [23] Combined Keyword Spotting and Localization Network Based on Multi-Task Learning
    Ko, Jungbeom
    Kim, Hyunchul
    Kim, Jungsuk
    MATHEMATICS, 2024, 12 (21)
  • [24] An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting
    Sun, Ming
    Schwarz, Andreas
    Wu, Minhua
    Strom, Nikko
    Matsoukas, Spyros
    Vitaladevuni, Shiv
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 255 - 260
  • [25] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
    Ghandoura, Abdulkader
    Hjabo, Farouk
    Al Dakkak, Oumayma
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [26] STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
    He, Yanzhang
    Prabhavalkar, Rohit
    Rao, Kanishka
    Li, Wei
    Bakhtin, Anton
    McGraw, Ian
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 474 - 481
  • [27] Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
    Yang, Seunghan
    Kim, Byeonggeun
    Shim, Kyuhong
    Chang, Simyung
    INTERSPEECH 2023, 2023, : 1633 - 1637
  • [28] SMALL-FOOTPRINT KEYWORD SPOTTING ON RAW AUDIO DATA WITH SINC-CONVOLUTIONS
    Mittermaier, Simon
    Kuerzinger, Ludwig
    Waschneck, Bernd
    Rigoll, Gerhard
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7454 - 7458
  • [29] DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting
    Lv, Shubo
    Wang, Xiong
    Sun, Sining
    Ma, Long
    Xie, Lei
    INTERSPEECH 2023, 2023, : 929 - 933
  • [30] AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS
    Prabhavalkar, Rohit
    Alvarez, Raziel
    Parada, Carolina
    Nakkiran, Preetum
    Sainath, Tara N.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4704 - 4708