Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data

被引:0
|
作者
Xu, Menglong [1 ]
Li, Shengqiang [1 ]
Liang, Chengdong [1 ]
Zhang, Xiao-Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, CIAIC, Xian, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
美国国家科学基金会;
关键词
keyword spotting; multi-class AUC optimization;
D O I
10.21437/Interspeech.2022-11356
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, most of the KWS methods take softmax with the minimum cross-entropy as the loss function, which focuses only on maximizing the classification accuracy on the training set, without taking unseen sounds that are out of the training data into account. If training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where the unseen sounds are frequently encountered. In this paper, we propose a new KWS method, which consists of a novel loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC), and a confidence-based decision method. The proposed KWS method not only maintains high keywords classification accuracy, but is also robust to the unseen sounds. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves state-of-the-art performance in terms of most evaluation metrics.
引用
收藏
页码:3278 / 3282
页数:5
相关论文
共 37 条
  • [31] Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
    Yu, Meng
    Ji, Xuan
    Gao, Yi
    Chen, Lianwu
    Chen, Jie
    Zheng, Jimeng
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2613 - 2617
  • [32] Multi-Resolution Stacked 1D-CNN for Small-Footprint keyword Spotting with Two-Stage Detection
    Tang, Jian
    Xue, Shaofei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 310 - 314
  • [33] A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
    Bai, Ye
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Tian, Zhengkun
    Zhao, Chenghao
    Fan, Cunhang
    INTERSPEECH 2019, 2019, : 2190 - 2194
  • [34] DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
    Wu, Zong-En
    Chan, Shao-Jung
    Wubet, Yeshanew Ale
    Lian, Kuang-Yow
    IEEE ACCESS, 2025, 13 : 23498 - 23507
  • [35] Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness
    Ng, Dianwen
    Xiao, Yang
    Yip, Jia Qi
    Yang, Zhao
    Tian, Biao
    Fu, Qiang
    Chng, Eng Siong
    Ma, Bin
    INTERSPEECH 2023, 2023, : 296 - 300
  • [36] Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
    Yang, Seunghan
    Kim, Byeonggeun
    Shim, Kyuhong
    Chang, Simyung
    INTERSPEECH 2023, 2023, : 1633 - 1637
  • [37] A Depthwise Separable Convolution Neural Network for Small-footprint Keyword Spotting Using Approximate MAC Unit and Streaming Convolution Reuse
    Lu, Yicheng
    Shan, Weiwei
    Xu, Jiaming
    2019 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2019), 2019, : 309 - 312