Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability

被引:1
|
作者
Yang, Chouchang [1 ]
Saidutta, Yashas Malur [1 ]
Srinivasa, Rakshith Sharma [1 ]
Lee, Ching-Hua [1 ]
Shen, Yilin [1 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
来源
INTERSPEECH 2023 | 2023年
关键词
keyword spotting; speech commands; speech presence probability; noise robust; speech enhancement;
D O I
10.21437/Interspeech.2023-2222
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although various deep keyword spotting (KWS) systems have demonstrated promising performance under relatively noiseless environments, accurate keyword detection in the presence of strong noise remains challenging. Room acoustics and noise conditions can be highly diverse, leading to drastic performance degradation if not handled carefully. In this paper, we propose a noise management front-end called SE-SPP Net performing speech enhancement (SE) and speech presence probability (SPP) estimation jointly for robust KWS in noise. The SE-SPP Net estimates both the denoised Mel spectrogram and the position of the speech utterance in the noisy signal, where the latter is estimated as the probability of a particular time-frequency bin containing speech. Further, it comes at relatively no cost in model size when compared to a model estimating the denoised speech. Our SE-SPP Net can improve noisy KWS performance by up to 7% compared to a similar sized state-of-the-art model at SNR -10dB.
引用
收藏
页码:1638 / 1642
页数:5
相关论文
共 50 条
  • [31] Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments
    Saleem, Nasir
    Khattak, Muhammad Irfan
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (01): : 84 - 90
  • [32] Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments
    Liu, Yang
    Nower, Naushin
    Morita, Shota
    Unoki, Masashi
    SPEECH COMMUNICATION, 2016, 84 : 1 - 14
  • [33] POST-FILTER DESIGN FOR SPEECH ENHANCEMENT IN VARIOUS NOISY ENVIRONMENTS
    Niwa, Kenta
    Hioka, Yusuke
    Kobayashi, Kazunori
    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 35 - 39
  • [34] Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
    Soleymanpour, Rahim
    Soleymanpour, Mohammad
    Brammer, Anthony J.
    Johnson, Michael T.
    Kim, Insoo
    IEEE ACCESS, 2023, 11 : 5328 - 5336
  • [35] An Anchor-Free Detector for Continuous Speech Keyword Spotting
    Zhao, Zhiyuan
    Tang, Chuanxin
    Yao, Chengdong
    Luo, Chong
    INTERSPEECH 2022, 2022, : 3228 - 3232
  • [36] Keyword spotting in continuous speech using convolutional neural network
    Rostami, Amir Mohammad
    Karimi, Ali
    Akhaee, Mohammad Ali
    SPEECH COMMUNICATION, 2022, 142 : 15 - 21
  • [37] Keyword Spotting for Google Assistant Using Contextual Speech Recognition
    Michaely, Assaf Hurwitz
    Zhang, Xuedong
    Simko, Gabor
    Parada, Carolina
    Aleksic, Petar
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 272 - 278
  • [38] Keyword Spotting Based On CTC and RNN For Mandarin Chinese Speech
    Wang, Yiyan
    Long, Yanhua
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 374 - 378
  • [39] A multichannel subspace approach with signal presence probability for speech enhancement
    Jungpyo Hong
    Multidimensional Systems and Signal Processing, 2019, 30 : 2045 - 2058
  • [40] THE 2013 BBN VIETNAMESE TELEPHONE SPEECH KEYWORD SPOTTING SYSTEM
    Tsakalidis, Stavros
    Hsiao, Roger
    Karakos, Damianos
    Ng, Tim
    Ranjan, Shivesh
    Saikumar, Guruprasad
    Zhang, Le
    Nguyen, Long
    Schwartz, Richard
    Makhoul, John
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,