Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

被引:3
作者
Woellmer, Martin [1 ]
Schuller, Bjoern [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-80333 Munich, Germany
关键词
Probabilistic feature extraction; Bottleneck networks; Long Short-Term Memory; Bidirectional speech processing; CONNECTIONIST FEATURE-EXTRACTION; BIDIRECTIONAL LSTM; NECK FEATURES;
D O I
10.1016/j.neucom.2012.06.064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel context-sensitive feature extraction approach for spontaneous speech recognition. As bidirectional Long Short-Term Memory (BLSTM) networks are known to enable improved phoneme recognition accuracies by incorporating long-range contextual information into speech decoding, we integrate the BLSTM principle into a Tandem front-end for probabilistic feature extraction. Unlike the previously proposed approaches which exploit BLSTM modeling by generating a discrete phoneme prediction feature, our feature extractor merges continuous high-level probabilistic BLSTM features with low-level features. By combining BLSTM modeling and Bottleneck (BN) feature generation, we propose a novel front-end that allows us to produce context-sensitive probabilistic feature vectors of arbitrary size, independent of the network training targets. Evaluations on challenging spontaneous, conversational speech recognition tasks show that this concept prevails over recently published architectures for feature-level context modeling. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:113 / 120
页数:8
相关论文
共 10 条
  • [1] PROBABILISTIC ASR FEATURE EXTRACTION APPLYING CONTEXT-SENSITIVE CONNECTIONIST TEMPORAL CLASSIFICATION NETWORKS
    Woellmer, Martin
    Schuller, Bjoern
    Rigoll, Gerhard
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7125 - 7129
  • [2] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
    Ren, Bo
    Wang, Longbiao
    Lu, Liang
    Ueda, Yuma
    Kai, Atsuhiko
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5093 - 5108
  • [3] TANDEM-Bottleneck Feature Combination using Hierarchical Deep Neural Networks
    Ravanelli, Mirco
    Van Hai Do
    Janin, Adam
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 113 - +
  • [4] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [5] Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework
    Woellmer, Martin
    Eyben, Florian
    Graves, Alex
    Schuller, Bjoern
    Rigoll, Gerhard
    COGNITIVE COMPUTATION, 2010, 2 (03) : 180 - 190
  • [6] Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework
    Martin Wöllmer
    Florian Eyben
    Alex Graves
    Björn Schuller
    Gerhard Rigoll
    Cognitive Computation, 2010, 2 : 180 - 190
  • [7] Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling
    Woellmer, Martin
    Metallinou, Angeliki
    Eyben, Florian
    Schuller, Bjoern
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2362 - +
  • [8] DOA Estimation by Feature Extraction Based on Parallel Deep Neural Networks and MRMR Feature Selection Algorithm
    Al-Tameemi, Ashwaq Neaman Hassan
    Feghhi, Mahmood Mohassel
    Tazehkand, Behzad Mozaffari
    IEEE ACCESS, 2025, 13 : 40480 - 40502
  • [9] A Context-Sensitive-Chunk BPTT Approach to Training Deep LSTM/BLSTM Recurrent Neural Networks for Offline Handwriting Recognition
    Chen, Kai
    Yant, Zhi-Jie
    Huo, Qiang
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 411 - 415
  • [10] Long Short-Term Memory Autoencoder Neural Networks Based DC Pulsed Load Monitoring Using Short-Time Fourier Transform Feature Extraction
    Ma, Yue
    Maqsood, Atif
    Corzine, Keith
    Oslebo, Damian
    2020 IEEE 29TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2020, : 912 - 917