Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data

被引:0
作者
Xu, Menglong [1 ]
Li, Shengqiang [1 ]
Liang, Chengdong [1 ]
Zhang, Xiao-Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, CIAIC, Xian, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
美国国家科学基金会;
关键词
keyword spotting; multi-class AUC optimization;
D O I
10.21437/Interspeech.2022-11356
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, most of the KWS methods take softmax with the minimum cross-entropy as the loss function, which focuses only on maximizing the classification accuracy on the training set, without taking unseen sounds that are out of the training data into account. If training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where the unseen sounds are frequently encountered. In this paper, we propose a new KWS method, which consists of a novel loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC), and a confidence-based decision method. The proposed KWS method not only maintains high keywords classification accuracy, but is also robust to the unseen sounds. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves state-of-the-art performance in terms of most evaluation metrics.
引用
收藏
页码:3278 / 3282
页数:5
相关论文
共 29 条
[1]  
[Anonymous], 2014, ICASSP IEEE INT C AC, DOI DOI 10.1109/ICASSP.2014.6854370
[2]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[3]   Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting [J].
Arik, Sercan O. ;
Kliegl, Markus ;
Child, Rewon ;
Hestness, Joel ;
Gibiansky, Andrew ;
Fougner, Chris ;
Prenger, Ryan ;
Coates, Adam .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1606-1610
[4]   A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting [J].
Bai, Ye ;
Yi, Jiangyan ;
Tao, Jianhua ;
Wen, Zhengqi ;
Tian, Zhengkun ;
Zhao, Chenghao ;
Fan, Cunhang .
INTERSPEECH 2019, 2019, :2190-2194
[5]  
Bai ZX, 2020, INT CONF ACOUST SPEE, P6819, DOI [10.1109/icassp40776.2020.9053674, 10.1109/ICASSP40776.2020.9053674]
[6]   Towards Open Set Deep Networks [J].
Bendale, Abhijit ;
Boult, Terrance E. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1563-1572
[7]   Temporal Convolution for Real-time Keyword Spotting on Mobile Devices [J].
Choi, Seungwoo ;
Seo, Seokjun ;
Shin, Beomjun ;
Byun, Hyeongmin ;
Kersner, Martin ;
Kim, Beomsu ;
Kim, Dongyoung ;
Ha, Sungjoo .
INTERSPEECH 2019, 2019, :3372-3376
[8]  
Chow B, 2017, PERFORMANCE AND PROFESSIONAL WRESTLING, P1
[9]  
Cortes C., 2003, 17 ANN C NEUR INF PR
[10]  
DeVries T., 2018, ARXIV180204865