Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness

被引：3

作者：

Ng, Dianwen ^{[1
,2
]}

Xiao, Yang ^{[2
]}

Yip, Jia Qi ^{[1
,2
]}

Yang, Zhao ^{[2
]}

Tian, Biao ^{[1
]}

Fu, Qiang ^{[1
]}

Chng, Eng Siong ^{[2
]}

Ma, Bin ^{[1
]}

机构：

[1] Alibaba Grp, Speech Lab DAMO Acad, Hangzhou, Peoples R China

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

INTERSPEECH 2023 | 2023年

关键词：

Small Footprint; Keyword Spotting; Multichannel; Noisy Far-field; Centroid Awareness;

D O I：

10.21437/Interspeech.2023-1210

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for small-footprint models, given the restrictions on computational resources (e.g., model size, running memory). This is even more intricate when handling noises from multiple microphones. To address this, we present a new multi-channel model that uses a CNN-based network with a linear mixing unit to achieve local-global dependency representations. Our method enhances noise-robustness while ensuring more efficient computation. Besides, we propose an end-to-end centroid-based awareness module that provides class similarity awareness at the bottleneck level to correct ambiguous cases during prediction. We conducted experiments using real noisy far-field data from the MISP challenge 2021 and achieved SOTA results compared to existing small-footprint KWS models. Our best score of 0.126 is highly competitive against larger models like 3D-ResNet, which is 0.122, but ours is much smaller at 473K compared to 13M.

引用

页码：296 / 300

页数：5

共 50 条

[21] Speech densely connected convolutional networks for small-footprint keyword spotting
Tsung-Han Tsai
Xin-Hui Lin
Multimedia Tools and Applications, 2023, 82 : 39119 - 39137
[22] Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
Xu, Menglong
Li, Shengqiang
Liang, Chengdong
Zhang, Xiao-Lei
INTERSPEECH 2022, 2022, : 3278 - 3282
[23] Combined Keyword Spotting and Localization Network Based on Multi-Task Learning
Ko, Jungbeom
Kim, Hyunchul
Kim, Jungsuk
MATHEMATICS, 2024, 12 (21)
[24] An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting
Sun, Ming
Schwarz, Andreas
Wu, Minhua
Strom, Nikko
Matsoukas, Spyros
Vitaladevuni, Shiv
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 255 - 260
[25] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
Ghandoura, Abdulkader
Hjabo, Farouk
Al Dakkak, Oumayma
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[26] STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
He, Yanzhang
Prabhavalkar, Rohit
Rao, Kanishka
Li, Wei
Bakhtin, Anton
McGraw, Ian
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 474 - 481
[27] Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
Yang, Seunghan
Kim, Byeonggeun
Shim, Kyuhong
Chang, Simyung
INTERSPEECH 2023, 2023, : 1633 - 1637
[28] SMALL-FOOTPRINT KEYWORD SPOTTING ON RAW AUDIO DATA WITH SINC-CONVOLUTIONS
Mittermaier, Simon
Kuerzinger, Ludwig
Waschneck, Bernd
Rigoll, Gerhard
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7454 - 7458
[29] DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting
Lv, Shubo
Wang, Xiong
Sun, Sining
Ma, Long
Xie, Lei
INTERSPEECH 2023, 2023, : 929 - 933
[30] AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS
Prabhavalkar, Rohit
Alvarez, Raziel
Parada, Carolina
Nakkiran, Preetum
Sainath, Tara N.
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4704 - 4708

← 1 2 3 4 5 →