Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引:9
|
作者
Gonzalez, Jose A. [1 ]
Peinado, Antonio M. [1 ]
Gomez, Angel M. [1 ]
Carmona, Jose L. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;
D O I
10.1109/TASL.2010.2087753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.
引用
收藏
页码:1206 / 1220
页数:15
相关论文
共 50 条
  • [21] Processing of speech signals for robust recognition in practical environments
    Vishala Pannala
    CSI Transactions on ICT, 2017, 5 (2) : 167 - 178
  • [22] Issues with uncertainty decoding for noise robust automatic speech recognition
    Liao, H.
    Gales, M. J. F.
    SPEECH COMMUNICATION, 2008, 50 (04) : 265 - 277
  • [23] A supervised learning approach to uncertainty decoding for robust speech recognition
    Srinivasan, Soundararajan
    Wang, DeLiang
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 297 - 300
  • [24] Robust Speech Recognition for Similar Pronunciation Phrases Using MMSE under Noise Environments
    Watanabe, Masumi
    Tsutsui, Hiroshi
    Miyanaga, Yoshikazu
    2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 802 - 807
  • [25] MMSE-based stereo feature stochastic mapping for noise robust speech recognition
    Cui, Xiaodong
    Afify, Mohamed
    Gao, Yuqing
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4077 - +
  • [26] Sequential estimation with optimal forgetting for robust speech recognition
    Afify, M
    Siohan, O
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 19 - 26
  • [27] Statistical estimation of unreliable features for robust speech recognition
    Renevey, P
    Drygajlo, A
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1731 - 1734
  • [28] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [29] An efficient framework for robust mobile speech recognition services
    Rose, RC
    Arizmendi, I
    Parthasarathy, S
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 316 - 319
  • [30] Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty
    Kandagatla, Ravi Kumar
    Subbaiah, P. V.
    SPEECH COMMUNICATION, 2018, 96 : 10 - 27