ROBUST DISCRIMINATIVE KEYWORD SPOTTING FOR EMOTIONALLY COLORED SPONTANEOUS SPEECH USING BIDIRECTIONAL LSTM NETWORKS

被引:31
|
作者
Woellmer, Martin [1 ]
Eyben, Florian [1 ]
Keshet, Joseph [2 ]
Graves, Alex [3 ]
Schuller, Bjoern [1 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-8000 Munich, Germany
[2] Idiap Res Inst, Martigny, Switzerland
[3] Tech Univ Munich, Inst Comp Sci 6, D-80290 Munich, Germany
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Speech recognition; Robustness; Recurrent neural networks;
D O I
10.1109/ICASSP.2009.4960492
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.
引用
收藏
页码:3949 / +
页数:2
相关论文
共 28 条
  • [1] Keyword Spotting for Google Assistant Using Contextual Speech Recognition
    Michaely, Assaf Hurwitz
    Zhang, Xuedong
    Simko, Gabor
    Parada, Carolina
    Aleksic, Petar
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 272 - 278
  • [2] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion
    Laxmi Pandey
    Rajesh M. Hegde
    Circuits, Systems, and Signal Processing, 2019, 38 : 2767 - 2791
  • [3] Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion
    Pandey, Laxmi
    Hegde, Rajesh M.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (06) : 2767 - 2791
  • [4] Indonesian Continuous Speech Recognition Using CNN and Bidirectional LSTM
    Naiborhu, Anwar Petrus F.
    Endah, Sukmawati Nur
    2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,
  • [5] Speech Recognition for Keyword Spotting using a Set of Modulation Based Features - Preliminary Results
    Gopalan, Kaliappan
    Chu, Tao
    IMCIC 2010: INTERNATIONAL MULTI-CONFERENCE ON COMPLEXITY, INFORMATICS AND CYBERNETICS, VOL II, 2010, : 32 - 36
  • [6] A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification
    Maity, Karabi
    Pradhan, Gayadhar
    Singh, Jyoti Prakash
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (04) : 1892 - 1904
  • [7] A novel phone-state matrix based vocabulary-indenendent keyword spotting method for spontaneous speech
    Gao, Peng
    Liang, JiaEn
    Ding, Peng
    Xu, Bo
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 425 - +
  • [8] A 3.8-μW 10-Keyword Noise-Robust Keyword Spotting Processor Using Symmetric Compressed Ternary-Weight Neural Networks
    Liu, Bo
    Xie, Na
    Zhang, Renyuan
    Yang, Haichuan
    Wang, Ziyu
    Fan, Deliang
    Wang, Zhen
    Liu, Weiqiang
    Cai, Hao
    IEEE OPEN JOURNAL OF THE SOLID-STATE CIRCUITS SOCIETY, 2023, 3 : 185 - 196
  • [9] Directed Automatic Speech Transcription Error Correction Using Bidirectional LSTM
    Zheng, Da
    Chen, Zhehuai
    Wu, Yue
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [10] USING BIDIRECTIONAL LSTM RECURRENT NEURAL NETWORKS TO LEARN HIGH-LEVEL ABSTRACTIONS OF SEQUENTIAL FEATURES FOR AUTOMATED SCORING OF NON-NATIVE SPONTANEOUS SPEECH
    Yu, Zhou
    Ramanarayanan, Vikram
    Suendermann-Oeft, David
    Wang, Xinhao
    Zechner, Klaus
    Chen, Lei
    Tao, Jidong
    Ivanou, Aliaksei
    Qian, Yao
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 338 - 345