FOCAL LOSS AND DOUBLE-EDGE-TRIGGERED DETECTOR FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING

被引:0
作者
Liu, Bin [1 ,2 ]
Nie, Shuai [1 ]
Zhang, Yaping [1 ,2 ]
Liang, Shan [1 ]
Yang, Zhanlei [1 ]
Liu, Wenju [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Patten Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
keyword spotting; focal loss; double-edge-triggered detecting method; speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Keyword spotting (KWS) system constitutes a critical component of human-computer interfaces, which detects the specific keyword from a continuous stream of audio. The goal of KWS is providing a high detection accuracy at a low false alarm rate while having small memory and computation requirements. The DNN-based KWS system faces a large class imbalance during training because the amount of data available for the keyword is usually much less than the background speech, which overwhelms training and leads to a degenerate model. In this paper, we explore the focal loss for the training of a small-footprint KWS system. It can automatically down-weight the contribution of easy samples during training and focus the model on hard samples, which naturally solves the class imbalance and allows us to efficiently utilize all data available. Furthermore, many keywords of Chinese conversational assistants are repeated words due to the idiomatic usage, such as 'XIAO DU XIAO DU'. We propose a double-edge-triggered detecting method for the repeated keyword, which significantly reduces the false alarm rate relative to the single threshold method. Systematic experiments demonstrate significant further improvements compared to the baseline system.
引用
收藏
页码:6361 / 6365
页数:5
相关论文
共 20 条
  • [1] [Anonymous], Single Shot MultiBox Detector, DOI DOI 10.1007/978-3-319-46448-0_2
  • [2] [Anonymous], 2015, Compressing deep neural networks using a rank-constrained topology
  • [3] [Anonymous], 2015, 16 ANN C INT SPEECH
  • [4] [Anonymous], THESIS
  • [5] [Anonymous], 2012, P INTERSPEECH
  • [6] [Anonymous], LOSS MAX POOLING SEM
  • [7] Focal Loss for Dense Object Detection
    Lin, Tsung-Yi
    Goyal, Priya
    Girshick, Ross
    He, Kaiming
    Dollar, Piotr
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2999 - 3007
  • [8] [Anonymous], 2011, AUTOMATIC GAIN CONTR
  • [9] Guoguo Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4087, DOI 10.1109/ICASSP.2014.6854370
  • [10] An Adaptive Multi-Band System for Low Power Voice Command Recognition
    He, Qing
    Wornell, Gregory W.
    Ma, Wei
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1888 - 1892