Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data

被引:0
|
作者
Xu, Menglong [1 ]
Li, Shengqiang [1 ]
Liang, Chengdong [1 ]
Zhang, Xiao-Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, CIAIC, Xian, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
美国国家科学基金会;
关键词
keyword spotting; multi-class AUC optimization;
D O I
10.21437/Interspeech.2022-11356
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, most of the KWS methods take softmax with the minimum cross-entropy as the loss function, which focuses only on maximizing the classification accuracy on the training set, without taking unseen sounds that are out of the training data into account. If training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where the unseen sounds are frequently encountered. In this paper, we propose a new KWS method, which consists of a novel loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC), and a confidence-based decision method. The proposed KWS method not only maintains high keywords classification accuracy, but is also robust to the unseen sounds. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves state-of-the-art performance in terms of most evaluation metrics.
引用
收藏
页码:3278 / 3282
页数:5
相关论文
共 37 条
  • [1] AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS
    Prabhavalkar, Rohit
    Alvarez, Raziel
    Parada, Carolina
    Nakkiran, Preetum
    Sainath, Tara N.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4704 - 4708
  • [2] Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution
    Li, Ximin
    Wei, Xiaodong
    Qin, Xiaowei
    INTERSPEECH 2020, 2020, : 1987 - 1991
  • [3] EXPLORING REPRESENTATION LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Cui, Fan
    Guo, Liyong
    Wang, Quandong
    Gao, Peng
    Wang, Yujun
    INTERSPEECH 2022, 2022, : 3258 - 3262
  • [4] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
    Chen, Xi
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546
  • [5] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Tang, Raphael
    Lin, Jimmy
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
  • [6] Model compression applied to small-footprint keyword spotting
    Tucker, George
    Wu, Minhua
    Sun, Ming
    Panchapagesan, Sankaran
    Fu, Gengshen
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1878 - 1882
  • [7] SMALL-FOOTPRINT KEYWORD SPOTTING ON RAW AUDIO DATA WITH SINC-CONVOLUTIONS
    Mittermaier, Simon
    Kuerzinger, Ludwig
    Waschneck, Bernd
    Rigoll, Gerhard
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7454 - 7458
  • [8] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Heigold, Georg
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Deep Template Matching for Small-footprint and Configurable Keyword Spotting
    Zhang, Peng
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 2572 - 2576
  • [10] Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
    Arik, Sercan O.
    Kliegl, Markus
    Child, Rewon
    Hestness, Joel
    Gibiansky, Andrew
    Fougner, Chris
    Prenger, Ryan
    Coates, Adam
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1606 - 1610