Robust Bayesian estimation for context-based speech enhancement

被引:1
作者
Naidu, Devireddy Hanumantha Rao [1 ]
Srinivasan, Sriram [2 ]
机构
[1] Sri Sathya Sai Inst Higher Learning, Dept Math & Comp Sci, Anantapur 515134, Andhra Pradesh, India
[2] Microsoft Corp, Redmond, WA 98052 USA
来源
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2014年
关键词
Bayesian; Codebook; Context; Noise reduction; Speech enhancement; HIDDEN MARKOV-MODELS; NOISE; SUPPRESSION; ALGORITHM; TRACKING;
D O I
10.1186/s13636-014-0035-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coefficients, mel-frequency cepstrum coefficients, etc., have been found to be successful in enhancing noisy speech corrupted by nonstationary noise. However, these models are typically trained on speech data from multiple speakers under controlled acoustic conditions. In this paper, we introduce the notion of context-dependent models that are trained on speech data with one or more aspects of context, such as speaker, acoustic environment, speaking style, etc. In scenarios where the modeled and observed contexts match, context-dependent models can be expected to result in better performance, whereas context-independent models are preferred otherwise. In this paper, we present a Bayesian framework that automatically provides the benefits of both models under varying contexts. As several aspects of the context remain constant over an extended period during usage, a memory-based approach that exploits information from past data is employed. We use a codebook-based speech enhancement technique that employs trained models of speech and noise linear predictive coefficients as an example model-based approach. Using speaker, acoustic environment, and speaking style as aspects of context, we demonstrate the robustness of the proposed framework for different context scenarios, input signal-to-noise ratios, and number of contexts modeled.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 42 条
[1]   A Speech Enhancement Algorithm Based on a Chi MRF Model of the Speech STFT Amplitudes [J].
Andrianakis, Yiannis ;
White, Paul R. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08) :1508-1517
[2]  
[Anonymous], 1993, ITU T RECOMMENDATION
[3]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[4]  
[Anonymous], 2006, Proc of SPECOM'06
[5]  
Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208
[6]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[7]   A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise [J].
Borgstroem, Bengt Jonas ;
Alwan, Abeer .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08) :2579-2590
[8]   Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475
[9]   A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].
EPHRAIM, Y ;
VANTREES, HL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266
[10]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121