Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引:9
|
作者
Gonzalez, Jose A. [1 ]
Peinado, Antonio M. [1 ]
Gomez, Angel M. [1 ]
Carmona, Jose L. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;
D O I
10.1109/TASL.2010.2087753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.
引用
收藏
页码:1206 / 1220
页数:15
相关论文
共 50 条
  • [1] EFFICIENT VQ-BASED MMSE ESTIMATION FOR ROBUST SPEECH RECOGNITION
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Gomez, Angel M.
    Carmona, Jose L.
    Morales-Cordovilla, Juan A.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4558 - 4561
  • [2] Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition
    Stark, Anthony
    Paliwal, Kuldip
    SPEECH COMMUNICATION, 2011, 53 (01) : 51 - 61
  • [3] MMSE estimation of log-filterbank energies for robust speech recognition
    Stark, Anthony
    Paliwal, Kuldip
    SPEECH COMMUNICATION, 2011, 53 (03) : 403 - 416
  • [4] MMSE Estimation of Speech Power Spectral Density Under Speech Presence Uncertainty for Automatic Speech Recognition
    Liu, Jingang
    Zhou, Yi
    Ma, Yongbao
    Liu, Hongqing
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 412 - 416
  • [5] Speech enhancement using MMSE estimation under phase uncertainty
    Kandagatla R.
    Subbaiah P.V.
    International Journal of Speech Technology, 2017, 20 (2) : 373 - 385
  • [6] Comparison of Estimation Techniques in Joint Uncertainty Decoding for Noise Robust Speech Recognition
    Xu, Haitian
    Chin, K. K.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2363 - 2366
  • [7] Uncertainty estimation for a speech recognition system
    Morales-Munoz, Walter
    Calderon-Ramirez, Saul
    TECNOLOGIA EN MARCHA, 2024, 37 : 97 - 103
  • [8] A MMSE Estimator in Mel-Cepstral Domain for Robust Large Vocabulary Automatic Speech Recognition using Uncertainty Propagation
    Astudillo, Ramon Fernandez
    Orglmeister, Reinhold
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 713 - 716
  • [9] Signal processing techniques for robust speech recognition
    Asano, Futoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 393 - 401
  • [10] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
    Menon, Anjali
    Kim, Chanwoo
    Kurokawa, Umpei
    Stern, Richard M.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31