ROBUST DISCRIMINATIVE KEYWORD SPOTTING FOR EMOTIONALLY COLORED SPONTANEOUS SPEECH USING BIDIRECTIONAL LSTM NETWORKS

被引:31
作者
Woellmer, Martin [1 ]
Eyben, Florian [1 ]
Keshet, Joseph [2 ]
Graves, Alex [3 ]
Schuller, Bjoern [1 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-8000 Munich, Germany
[2] Idiap Res Inst, Martigny, Switzerland
[3] Tech Univ Munich, Inst Comp Sci 6, D-80290 Munich, Germany
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Speech recognition; Robustness; Recurrent neural networks;
D O I
10.1109/ICASSP.2009.4960492
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.
引用
收藏
页码:3949 / +
页数:2
相关论文
共 28 条
  • [21] SYLLABIFICATION OF CONVERSATIONAL SPEECH USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY NEURAL NETWORKS
    Landsiedel, Christian
    Edlund, Jens
    Eyben, Florian
    Neiberg, Daniel
    Schuller, Bjoern
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5256 - 5259
  • [22] Robust Speech Emotion Recognition Using CNN plus LSTM Based on Stochastic Fractal Search Optimization Algorithm
    Abdelhamid, Abdel Aziza
    El-Kenawy, El-Sayed M.
    Alotaibi, Bandar
    Amer, Ghadam
    Abdelkader, Mahmoud Y.
    Ibrahim, Abdelhameed
    Eid, Marwa Metwally
    IEEE ACCESS, 2022, 10 : 49265 - 49284
  • [23] Recognizing Semi-Natural and Spontaneous Speech Emotions Using Deep Neural Networks
    Amjad, Ammar
    Khan, Lal
    Ashraf, Noman
    Mahmood, Muhammad Bilal
    Chang, Hsien-Tsung
    IEEE ACCESS, 2022, 10 : 37149 - 37163
  • [24] Cross-Streamer Wavefield Reconstruction of a Towed Streamer System Using Bidirectional LSTM Networks With a Traces-to-Trace Approach
    Yeeh, Zeu
    Yoon, Daeung
    Byun, Joongmoo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [25] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [26] Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System
    Fernandez, Raul
    Rendel, Asaf
    Ramabhadran, Bhuvana
    Hoory, Ron
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1606 - 1610
  • [27] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Aditya Arie Nugraha
    Kazumasa Yamamoto
    Seiichi Nakagawa
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [28] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Nugraha, Aditya Arie
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,