NEURAL NETWORK BASED TIME-FREQUENCY MASKING AND STEERING VECTOR ESTIMATION FOR TWO-CHANNEL MVDR BEAMFORMING

被引：0

作者：

Liu, Yuzhou ^{[1
,3
]}

Ganguly, Anshuman ^{[2
,3
]}

Kamath, Krishna ^{[3
]}

Kristjansson, Trausti ^{[3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Univ Texas Dallas, Dept Elect Engn, Dallas, TX USA

[3] Amazon Lab126, Sunnyvale, CA 94089 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Two-channel speech enhancement; MVDR beamforming; steering vector; neural networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a neural network based approach to two-channel beamforming. First, single- and cross-channel spectral features are extracted to form a feature map for each utterance. A large neural network that is the concatenation of a convolution neural network (CNN), long short-term memory recurrent neural network (LSTM-RNN) and deep neural network (DNN) is then employed to estimate frame-level speech and noise masks. Later, these predicted masks are used to compute cross-power spectral density (CPSD) matrices which are used to estimate the minimum variance distortion-less response (MVDR) beamformer coefficients. In the end, a DNN is trained to optimize the phase in the estimated steering vectors to make it robust for reverberant conditions. We compare our methods with two state-of-the-art two-channel speech enhancement systems, i.e., time-frequency masking and masking-based beamforming. Results show the proposed method leads to 21% relative improvement in word error rate (WER) over other systems.

引用

页码：6717 / 6721

页数：5