NEURAL NETWORK BASED TIME-FREQUENCY MASKING AND STEERING VECTOR ESTIMATION FOR TWO-CHANNEL MVDR BEAMFORMING

被引:0
|
作者
Liu, Yuzhou [1 ,3 ]
Ganguly, Anshuman [2 ,3 ]
Kamath, Krishna [3 ]
Kristjansson, Trausti [3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Univ Texas Dallas, Dept Elect Engn, Dallas, TX USA
[3] Amazon Lab126, Sunnyvale, CA 94089 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Two-channel speech enhancement; MVDR beamforming; steering vector; neural networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a neural network based approach to two-channel beamforming. First, single- and cross-channel spectral features are extracted to form a feature map for each utterance. A large neural network that is the concatenation of a convolution neural network (CNN), long short-term memory recurrent neural network (LSTM-RNN) and deep neural network (DNN) is then employed to estimate frame-level speech and noise masks. Later, these predicted masks are used to compute cross-power spectral density (CPSD) matrices which are used to estimate the minimum variance distortion-less response (MVDR) beamformer coefficients. In the end, a DNN is trained to optimize the phase in the estimated steering vectors to make it robust for reverberant conditions. We compare our methods with two state-of-the-art two-channel speech enhancement systems, i.e., time-frequency masking and masking-based beamforming. Results show the proposed method leads to 21% relative improvement in word error rate (WER) over other systems.
引用
收藏
页码:6717 / 6721
页数:5
相关论文
共 50 条
  • [1] Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
    Zhang, Wangyou
    Zhou, Ying
    Qian, Yanmin
    INTERSPEECH 2019, 2019, : 2703 - 2707
  • [2] Two-channel time-frequency audio watermarking
    Hertanto, Richard Nathaniel
    Foo, Say-Wei
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 886 - 889
  • [3] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Zhao, Shengkui
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
  • [4] Improve the robustness of MVDR beamforming method based on steering vector estimation and sparse constraint
    Ibrahim, K. N.
    Khalil, Elie
    2019 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2019,
  • [5] New Designs on MVDR Robust Adaptive Beamforming Based on Optimal Steering Vector Estimation
    Huang, Yongwei
    Zhou, Mingkang
    Vorobyov, Sergiy A.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (14) : 3624 - 3638
  • [6] Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
    Wang, Zhong-Qiu
    Zhang, Xueliang
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 322 - 326
  • [7] ONLINE MEETING RECOGNITION IN NOISY ENVIRONMENTS WITH TIME-FREQUENCY MASK BASED MVDR BEAMFORMING
    Araki, Shoko
    Ito, Nobutaka
    Delcroix, Marc
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Higuchi, Takuya
    Yoshioka, Takuya
    Dung Tran
    Karita, Shigeki
    Nakatani, Tomohiro
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 16 - 20
  • [8] Impact of phase estimation on single-channel speech separation based on time-frequency masking
    Mayer, Florian
    Williamson, Donald S.
    Mowlaee, Pejman
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4668 - 4679
  • [9] Dual channel neural network speech enhancement algorithm based on time frequency masking
    Jia, Hairong
    Mei, Shulin
    Zhang, Min
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49 (06): : 43 - 49
  • [10] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
    Bella, Mostafa
    Saylani, Hicham
    Hosseini, Shahram
    Deville, Yannick
    IEEE ACCESS, 2023, 11 : 100632 - 100645