Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引：9

作者：

Gonzalez, Jose A. ^{[1
]}

Peinado, Antonio M. ^{[1
]}

Gomez, Angel M. ^{[1
]}

Carmona, Jose L. ^{[1
]}

机构：

[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;

D O I：

10.1109/TASL.2010.2087753

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.

引用

页码：1206 / 1220

页数：15

共 50 条

[31] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
[32] Combining speech enhancement with feature post-processing for robust speech recognition
Lei, Jianjun
Guo, Jun
Liu, Gang
Wang, Jian
Nie, Xiangfei
Yang, Zhen
INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
[33] AN MCMC APPROACH TO JOINT ESTIMATION OF CLEAN SPEECH AND NOISE FOR ROBUST SPEECH RECOGNITION
Mushtaq, Aleem
Lee, Chin-Hui
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7107 - 7111
[34] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
Gonzalez, Jose A.
Peinado, Antonio M.
Ma, Ning
Gomez, Angel M.
Barker, Jon
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
[35] Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition
Xu, Haitian
Gales, Mark J. F.
Chin, K. K.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1665 - 1676
[36] Towards efficient and scalable speech compression schemes for robust speech recognition applications
Srinivasamurthy, N
Ortega, A
Zhu, Q
Alwan, A
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 249 - 252
[37] A Multichannel Feature-Based Processing for Robust Speech Recognition
Souden, Mehrez
Kinoshita, Keisuke
Delcroix, Marc
Nakatani, Tomohiro
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 696 - 699
[38] Propagation of Uncertainty through Multilayer Perceptrons for Robust Automatic Speech Recognition
Astudillo, Ramon Fernandez
da Silva Neto, Joao Paulo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 468 - 471
[39] MMSE Log-Spectral Amplitude Estimation for Single Channel Speech Enhancement under Speech Presence Uncertainty by Weibull Speech Priors
Bahrami, Mojtaba
Seyedin, Sanaz
26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), 2018, : 749 - 754
[40] IMPULSE RESPONSE ESTIMATION FOR ROBUST SPEECH RECOGNITION IN A REVERBERANT ENVIRONMENT
Ravanelli, Mirco
Sosi, Alessandro
Svaizer, Piergiorgio
Omologo, Maurizio
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1668 - 1672

← 1 2 3 4 5 →