Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

被引：1

作者：

Tammen, Marvin ^{[1
]}

Doclo, Simon

机构：

[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Covariance matrices; Noise measurement; Speech enhancement; Interference; Estimation; Transforms; Filtering algorithms; Matrix structures; multi-frame filtering; MVDR filter; speech enhancement; supervised learning; SUBSPACE APPROACH; NOISE-REDUCTION; SEPARATION; NETWORKS;

D O I：

10.1109/TASLP.2023.3306715

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Aiming at exploiting temporal correlations across consecutive time frames in the short-time Fourier transform (STFT) domain, multi-frame algorithms for single-microphone speech enhancement have been proposed. Typically, the multi-frame filter coefficients are either estimated directly using deep neural networks or a certain filter structure is imposed, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter structure. Recently, it was shown that integrating the fully differentiable MFMVDR filter into an end-to-end supervised learning framework employing temporal convolutional networks (TCNs) allows for a high estimation accuracy of the required parameters, i.e., the speech inter-frame correlation vector and the interference covariance matrix. In this paper, we investigate different covariance matrix structures, namely Hermitian positive-definite, Hermitian positive-definite Toeplitz, and rank-1. The main differences between the considered matrix structures lie in the number of parameters that need to be estimated by the TCNs as well as the required linear algebra operations. For example, assuming a rank-1 matrix structure, we show that the MFMVDR filter can be written as a linear combination of the TCN outputs, significantly reducing computational complexity. In addition, we consider a covariance matrix estimation procedure based on recursive smoothing. Experimental results on the deep noise suppression challenge dataset show that the estimation procedure using the Hermitian positive-definite matrix structure yields the best performance, closely followed by the rank-1 matrix structure at a much lower complexity. Furthermore, imposing the MFMVDR filter structure instead of directly estimating the multi-frame filter coefficients slightly but consistently improves the speech enhancement performance.

引用

页码：3237 / 3248

页数：12

共 31 条

[31] SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT
Lv, Shubo
Fu, Yihui
Jv, Yukai
Xie, Lei
Zhu, Weixin
Rao, Wei
Wang, Yannan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 436 - 443

← 1 2 3 4 →