Late Reverberation Cancellation Using Bayesian Estimation of Multi-Channel Linear Predictors and Student's t-Source Prior

被引：16

作者：

Chetupalli, Srikanth Raj ^{[1
]}

Sreenivas, Thippur, V ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Elect & Commun Engn, Bangalore 560012, Karnataka, India

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 06期

关键词：

Dereverberation; linear prediction; Bayesian learning; variational inference; SPEECH DEREVERBERATION; QUALITY; SEPARATION; NOISE;

D O I：

10.1109/TASLP.2019.2906427

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multi-channel linear prediction (MCLP) can model the late reverberation in the short-time Fourier transform domain using a delayed linear predictor and the prediction residual is taken as the desired early reflection component. Traditionally, a Gaussian source model with time-dependent precision (inverse of variance) is considered for the desired signal. In this paper, we propose a Student's t-distribution model for the desired signal, which is realized as a Gaussian source with a Gamma distributed precision. Further, since the choice of a proper MCLP order is critical, we also incorporate a Gaussian distribution prior for the prediction coefficients and a higher order. We consider a batch estimation scenario and develop variational Bayes expectation maximization (VBEM) algorithm for joint posterior inference and hyper-parameter estimation. This has lead to more accurate and robust estimation of the late reverb component and hence its cancellation, benefitting the desired residual signal estimation. Along with these stochastic models, we formulate single channel output (MISO) and multi channel output (MIMO) schemes using shared priors for the desired signal precision and the estimated MCLP coefficients at each microphone. Experiments using real room impulse responses show improved late reverberation suppression with the proposed VBEM approach over the traditional methods, for different room conditions. Additionally, we achieve a sparse coefficient vector for the MCLP avoiding the criticality of manually choosing the model order. The MIMO formulation is easily extended to include spatial filtering of the enhanced signals, which further improves the estimation of the desired signal.

引用

页码：1007 / 1018

页数：12

共 39 条

[1]

Assmann Peter, 2004, VVolume 18, P231

[2]

BISHOP C. M., 2006, Pattern recognition and machine learning, DOI [DOI 10.1117/1.2819119, 10.1007/978-0-387-45528-0]

[3] On the importance of early reflections for speech in rooms [J].

Bradley, JS ;

Sato, H ;

Picard, M .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (06) :3233-3244

[4] Online Dereverberation for Dynamic Scenarios Using a Kalman Filter With an Autoregressive Model [J].

Braun, Sebastian ;

Habets, Emanuel A. P. .

IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (12) :1741-1745

[5] Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for Speech [J].

Chetupalli, Srikanth Raj ;

Sreenivas, T. V. .

IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) :357-361

[6] ROBUST ADAPTIVE BEAMFORMING [J].

COX, H ;

ZESKIND, RM ;

OWEN, MM .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (10) :1365-1376

[7] Gamma modeling of speech power and its on-line estimation for statistical speech enhancement [J].

Dat, TH ;

Takeda, K ;

Itakura, F .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) :1040-1049

[8]

Delcroix M., 2014, Proceedings of REVERB Challenge Workshop

[9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[10] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech [J].

Falk, Tiago H. ;

Zheng, Chenxi ;

Chan, Wai-Yip .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1766-1774

← 1 2 3 4 →