Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

被引:1
|
作者
Tammen, Marvin [1 ]
Doclo, Simon
机构
[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany
关键词
Covariance matrices; Noise measurement; Speech enhancement; Interference; Estimation; Transforms; Filtering algorithms; Matrix structures; multi-frame filtering; MVDR filter; speech enhancement; supervised learning; SUBSPACE APPROACH; NOISE-REDUCTION; SEPARATION; NETWORKS;
D O I
10.1109/TASLP.2023.3306715
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Aiming at exploiting temporal correlations across consecutive time frames in the short-time Fourier transform (STFT) domain, multi-frame algorithms for single-microphone speech enhancement have been proposed. Typically, the multi-frame filter coefficients are either estimated directly using deep neural networks or a certain filter structure is imposed, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter structure. Recently, it was shown that integrating the fully differentiable MFMVDR filter into an end-to-end supervised learning framework employing temporal convolutional networks (TCNs) allows for a high estimation accuracy of the required parameters, i.e., the speech inter-frame correlation vector and the interference covariance matrix. In this paper, we investigate different covariance matrix structures, namely Hermitian positive-definite, Hermitian positive-definite Toeplitz, and rank-1. The main differences between the considered matrix structures lie in the number of parameters that need to be estimated by the TCNs as well as the required linear algebra operations. For example, assuming a rank-1 matrix structure, we show that the MFMVDR filter can be written as a linear combination of the TCN outputs, significantly reducing computational complexity. In addition, we consider a covariance matrix estimation procedure based on recursive smoothing. Experimental results on the deep noise suppression challenge dataset show that the estimation procedure using the Hermitian positive-definite matrix structure yields the best performance, closely followed by the rank-1 matrix structure at a much lower complexity. Furthermore, imposing the MFMVDR filter structure instead of directly estimating the multi-frame filter coefficients slightly but consistently improves the speech enhancement performance.
引用
收藏
页码:3237 / 3248
页数:12
相关论文
共 31 条
  • [31] SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Lv, Shubo
    Fu, Yihui
    Jv, Yukai
    Xie, Lei
    Zhu, Weixin
    Rao, Wei
    Wang, Yannan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 436 - 443