Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

被引：9

作者：

Gonzalez, Jose A. ^{[1
]}

Peinado, Antonio M. ^{[1
]}

Gomez, Angel M. ^{[1
]}

Carmona, Jose L. ^{[1
]}

机构：

[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data; CHANNEL ERROR MITIGATION; PACKET LOSS CONCEALMENT; NOISY ENVIRONMENTS; ENHANCEMENT;

D O I：

10.1109/TASL.2010.2087753

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.

引用

页码：1206 / 1220

页数：15

共 50 条

[21] Processing of speech signals for robust recognition in practical environments
Vishala Pannala
CSI Transactions on ICT, 2017, 5 (2) : 167 - 178
[22] Issues with uncertainty decoding for noise robust automatic speech recognition
Liao, H.
Gales, M. J. F.
SPEECH COMMUNICATION, 2008, 50 (04) : 265 - 277
[23] A supervised learning approach to uncertainty decoding for robust speech recognition
Srinivasan, Soundararajan
Wang, DeLiang
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 297 - 300
[24] Robust Speech Recognition for Similar Pronunciation Phrases Using MMSE under Noise Environments
Watanabe, Masumi
Tsutsui, Hiroshi
Miyanaga, Yoshikazu
2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 802 - 807
[25] MMSE-based stereo feature stochastic mapping for noise robust speech recognition
Cui, Xiaodong
Afify, Mohamed
Gao, Yuqing
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4077 - +
[26] Sequential estimation with optimal forgetting for robust speech recognition
Afify, M
Siohan, O
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 19 - 26
[27] Statistical estimation of unreliable features for robust speech recognition
Renevey, P
Drygajlo, A
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1731 - 1734
[28] Efficient Speaker and Noise Normalization for Robust Speech Recognition
Joshi, Vikas
Bilgi, Raghavendra
Umesh, S.
Benitez, C.
Garcia, L.
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
[29] An efficient framework for robust mobile speech recognition services
Rose, RC
Arizmendi, I
Parthasarathy, S
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 316 - 319
[30] Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty
Kandagatla, Ravi Kumar
Subbaiah, P. V.
SPEECH COMMUNICATION, 2018, 96 : 10 - 27

← 1 2 3 4 5 →