NEURAL NETWORK BASED TIME-FREQUENCY MASKING AND STEERING VECTOR ESTIMATION FOR TWO-CHANNEL MVDR BEAMFORMING

被引：0

作者：

Liu, Yuzhou ^{[1
,3
]}

Ganguly, Anshuman ^{[2
,3
]}

Kamath, Krishna ^{[3
]}

Kristjansson, Trausti ^{[3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Univ Texas Dallas, Dept Elect Engn, Dallas, TX USA

[3] Amazon Lab126, Sunnyvale, CA 94089 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Two-channel speech enhancement; MVDR beamforming; steering vector; neural networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a neural network based approach to two-channel beamforming. First, single- and cross-channel spectral features are extracted to form a feature map for each utterance. A large neural network that is the concatenation of a convolution neural network (CNN), long short-term memory recurrent neural network (LSTM-RNN) and deep neural network (DNN) is then employed to estimate frame-level speech and noise masks. Later, these predicted masks are used to compute cross-power spectral density (CPSD) matrices which are used to estimate the minimum variance distortion-less response (MVDR) beamformer coefficients. In the end, a DNN is trained to optimize the phase in the estimated steering vectors to make it robust for reverberant conditions. We compare our methods with two state-of-the-art two-channel speech enhancement systems, i.e., time-frequency masking and masking-based beamforming. Results show the proposed method leads to 21% relative improvement in word error rate (WER) over other systems.

引用

页码：6717 / 6721

页数：5

共 50 条

[1] Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
Zhang, Wangyou
Zhou, Ying
Qian, Yanmin
INTERSPEECH 2019, 2019, : 2703 - 2707
[2] Two-channel time-frequency audio watermarking
Hertanto, Richard Nathaniel
Foo, Say-Wei
2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 886 - 889
[3] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
Xiao, Xiong
Zhao, Shengkui
Jones, Douglas L.
Chng, Eng Siong
Li, Haizhou
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
[4] Improve the robustness of MVDR beamforming method based on steering vector estimation and sparse constraint
Ibrahim, K. N.
Khalil, Elie
2019 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2019,
[5] New Designs on MVDR Robust Adaptive Beamforming Based on Optimal Steering Vector Estimation
Huang, Yongwei
Zhou, Mingkang
Vorobyov, Sergiy A.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (14) : 3624 - 3638
[6] Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Wang, Zhong-Qiu
Zhang, Xueliang
Wang, DeLiang
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 322 - 326
[7] ONLINE MEETING RECOGNITION IN NOISY ENVIRONMENTS WITH TIME-FREQUENCY MASK BASED MVDR BEAMFORMING
Araki, Shoko
Ito, Nobutaka
Delcroix, Marc
Ogawa, Atsunori
Kinoshita, Keisuke
Higuchi, Takuya
Yoshioka, Takuya
Dung Tran
Karita, Shigeki
Nakatani, Tomohiro
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 16 - 20
[8] Impact of phase estimation on single-channel speech separation based on time-frequency masking
Mayer, Florian
Williamson, Donald S.
Mowlaee, Pejman
Wang, DeLiang
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4668 - 4679
[9] Dual channel neural network speech enhancement algorithm based on time frequency masking
Jia, Hairong
Mei, Shulin
Zhang, Min
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49 (06): : 43 - 49
[10] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
Bella, Mostafa
Saylani, Hicham
Hosseini, Shahram
Deville, Yannick
IEEE ACCESS, 2023, 11 : 100632 - 100645

← 1 2 3 4 5 →