Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness

被引:3
|
作者
Ng, Dianwen [1 ,2 ]
Xiao, Yang [2 ]
Yip, Jia Qi [1 ,2 ]
Yang, Zhao [2 ]
Tian, Biao [1 ]
Fu, Qiang [1 ]
Chng, Eng Siong [2 ]
Ma, Bin [1 ]
机构
[1] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
INTERSPEECH 2023 | 2023年
关键词
Small Footprint; Keyword Spotting; Multichannel; Noisy Far-field; Centroid Awareness;
D O I
10.21437/Interspeech.2023-1210
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for small-footprint models, given the restrictions on computational resources (e.g., model size, running memory). This is even more intricate when handling noises from multiple microphones. To address this, we present a new multi-channel model that uses a CNN-based network with a linear mixing unit to achieve local-global dependency representations. Our method enhances noise-robustness while ensuring more efficient computation. Besides, we propose an end-to-end centroid-based awareness module that provides class similarity awareness at the bottleneck level to correct ambiguous cases during prediction. We conducted experiments using real noisy far-field data from the MISP challenge 2021 and achieved SOTA results compared to existing small-footprint KWS models. Our best score of 0.126 is highly competitive against larger models like 3D-ResNet, which is 0.122, but ours is much smaller at 473K compared to 13M.
引用
收藏
页码:296 / 300
页数:5
相关论文
共 50 条
  • [1] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
    Chen, Xi
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546
  • [2] Small-Footprint Keyword Spotting Based on Gated Channel Transformation Sandglass Residual Neural Network
    Zhang, Ying
    Zhu, Shirong
    Yu, Chao
    Zhao, Lasheng
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (07)
  • [3] Compressed time delay neural network for small-footprint keyword spotting
    Sun, Ming
    Snyder, David
    Gao, Yixin
    Nagaraja, Varun
    Rodehorst, Mike
    Panchapagesan, Sankaran
    Strom, Nikko
    Matsoukas, Spyros
    Vitaladevuni, Shiv
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3607 - 3611
  • [4] Text Anchor Based Metric Learning for Small-footprint Keyword Spotting
    Wang, Li
    Gu, Rongzhi
    Chen, Nuo
    Zou, Yuexian
    INTERSPEECH 2021, 2021, : 4219 - 4223
  • [5] A Configurable Accelerator for Keyword Spotting Based on Small-Footprint Temporal Efficient Neural Network
    He, Keyan
    Chen, Dihu
    Su, Tao
    ELECTRONICS, 2022, 11 (16)
  • [6] Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution
    Li, Ximin
    Wei, Xiaodong
    Qin, Xiaowei
    INTERSPEECH 2020, 2020, : 1987 - 1991
  • [7] EXPLORING REPRESENTATION LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Cui, Fan
    Guo, Liyong
    Wang, Quandong
    Gao, Peng
    Wang, Yujun
    INTERSPEECH 2022, 2022, : 3258 - 3262
  • [8] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Tang, Raphael
    Lin, Jimmy
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
  • [9] Model compression applied to small-footprint keyword spotting
    Tucker, George
    Wu, Minhua
    Sun, Ming
    Panchapagesan, Sankaran
    Fu, Gengshen
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1878 - 1882
  • [10] Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
    Chen, Mengzhe
    Zhang, Shiliang
    Lei, Ming
    Liu, Yong
    Yao, Haitao
    Gao, Jie
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2663 - 2667