Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引:9
|
作者
Gonzalez, Jose A. [1 ]
Peinado, Antonio M. [1 ]
Gomez, Angel M. [1 ]
Carmona, Jose L. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;
D O I
10.1109/TASL.2010.2087753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.
引用
收藏
页码:1206 / 1220
页数:15
相关论文
共 50 条
  • [31] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [32] Combining speech enhancement with feature post-processing for robust speech recognition
    Lei, Jianjun
    Guo, Jun
    Liu, Gang
    Wang, Jian
    Nie, Xiangfei
    Yang, Zhen
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
  • [33] AN MCMC APPROACH TO JOINT ESTIMATION OF CLEAN SPEECH AND NOISE FOR ROBUST SPEECH RECOGNITION
    Mushtaq, Aleem
    Lee, Chin-Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7107 - 7111
  • [34] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [35] Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition
    Xu, Haitian
    Gales, Mark J. F.
    Chin, K. K.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1665 - 1676
  • [36] Towards efficient and scalable speech compression schemes for robust speech recognition applications
    Srinivasamurthy, N
    Ortega, A
    Zhu, Q
    Alwan, A
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 249 - 252
  • [37] A Multichannel Feature-Based Processing for Robust Speech Recognition
    Souden, Mehrez
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 696 - 699
  • [38] Propagation of Uncertainty through Multilayer Perceptrons for Robust Automatic Speech Recognition
    Astudillo, Ramon Fernandez
    da Silva Neto, Joao Paulo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 468 - 471
  • [39] MMSE Log-Spectral Amplitude Estimation for Single Channel Speech Enhancement under Speech Presence Uncertainty by Weibull Speech Priors
    Bahrami, Mojtaba
    Seyedin, Sanaz
    26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), 2018, : 749 - 754
  • [40] IMPULSE RESPONSE ESTIMATION FOR ROBUST SPEECH RECOGNITION IN A REVERBERANT ENVIRONMENT
    Ravanelli, Mirco
    Sosi, Alessandro
    Svaizer, Piergiorgio
    Omologo, Maurizio
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1668 - 1672