Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引：9

作者：

Gonzalez, Jose A. ^{[1
]}

Peinado, Antonio M. ^{[1
]}

Gomez, Angel M. ^{[1
]}

Carmona, Jose L. ^{[1
]}

机构：

[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;

D O I：

10.1109/TASL.2010.2087753

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.

引用

页码：1206 / 1220

页数：15

共 50 条

[1] EFFICIENT VQ-BASED MMSE ESTIMATION FOR ROBUST SPEECH RECOGNITION
Gonzalez, Jose A.
Peinado, Antonio M.
Gomez, Angel M.
Carmona, Jose L.
Morales-Cordovilla, Juan A.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4558 - 4561
[2] Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition
Stark, Anthony
Paliwal, Kuldip
SPEECH COMMUNICATION, 2011, 53 (01) : 51 - 61
[3] MMSE estimation of log-filterbank energies for robust speech recognition
Stark, Anthony
Paliwal, Kuldip
SPEECH COMMUNICATION, 2011, 53 (03) : 403 - 416
[4] MMSE Estimation of Speech Power Spectral Density Under Speech Presence Uncertainty for Automatic Speech Recognition
Liu, Jingang
Zhou, Yi
Ma, Yongbao
Liu, Hongqing
2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 412 - 416
[5] Speech enhancement using MMSE estimation under phase uncertainty
Kandagatla R.
Subbaiah P.V.
International Journal of Speech Technology, 2017, 20 (2) : 373 - 385
[6] Comparison of Estimation Techniques in Joint Uncertainty Decoding for Noise Robust Speech Recognition
Xu, Haitian
Chin, K. K.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2363 - 2366
[7] Uncertainty estimation for a speech recognition system
Morales-Munoz, Walter
Calderon-Ramirez, Saul
TECNOLOGIA EN MARCHA, 2024, 37 : 97 - 103
[8] A MMSE Estimator in Mel-Cepstral Domain for Robust Large Vocabulary Automatic Speech Recognition using Uncertainty Propagation
Astudillo, Ramon Fernandez
Orglmeister, Reinhold
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 713 - 716
[9] Signal processing techniques for robust speech recognition
Asano, Futoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 393 - 401
[10] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
Menon, Anjali
Kim, Chanwoo
Kurokawa, Umpei
Stern, Richard M.
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31

← 1 2 3 4 5 →