Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

被引：1

作者：

Tammen, Marvin ^{[1
]}

Doclo, Simon

机构：

[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Covariance matrices; Noise measurement; Speech enhancement; Interference; Estimation; Transforms; Filtering algorithms; Matrix structures; multi-frame filtering; MVDR filter; speech enhancement; supervised learning; SUBSPACE APPROACH; NOISE-REDUCTION; SEPARATION; NETWORKS;

D O I：

10.1109/TASLP.2023.3306715

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Aiming at exploiting temporal correlations across consecutive time frames in the short-time Fourier transform (STFT) domain, multi-frame algorithms for single-microphone speech enhancement have been proposed. Typically, the multi-frame filter coefficients are either estimated directly using deep neural networks or a certain filter structure is imposed, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter structure. Recently, it was shown that integrating the fully differentiable MFMVDR filter into an end-to-end supervised learning framework employing temporal convolutional networks (TCNs) allows for a high estimation accuracy of the required parameters, i.e., the speech inter-frame correlation vector and the interference covariance matrix. In this paper, we investigate different covariance matrix structures, namely Hermitian positive-definite, Hermitian positive-definite Toeplitz, and rank-1. The main differences between the considered matrix structures lie in the number of parameters that need to be estimated by the TCNs as well as the required linear algebra operations. For example, assuming a rank-1 matrix structure, we show that the MFMVDR filter can be written as a linear combination of the TCN outputs, significantly reducing computational complexity. In addition, we consider a covariance matrix estimation procedure based on recursive smoothing. Experimental results on the deep noise suppression challenge dataset show that the estimation procedure using the Hermitian positive-definite matrix structure yields the best performance, closely followed by the rank-1 matrix structure at a much lower complexity. Furthermore, imposing the MFMVDR filter structure instead of directly estimating the multi-frame filter coefficients slightly but consistently improves the speech enhancement performance.

引用

页码：3237 / 3248

页数：12

共 31 条

[1] DEEP MULTI-FRAME MVDR FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
Tammen, Marvin
Doclo, Simon
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8443 - 8447
[2] SUBSPACE-BASED SPEECH CORRELATION VECTOR ESTIMATION FOR SINGLE-MICROPHONE MULTI-FRAME MVDR FILTERING
Fischer, Dorte
Doclo, Simon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 856 - 860
[3] Sensitivity Analysis of the Multi-Frame MVDR Filter for Single-Microphone Speech Enhancement
Fischer, Dorte
Doclo, Simon
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 603 - 607
[4] Comparison of Parameter Estimation Methods for Single-Microphone Multi-Frame Wiener Filtering
Fischer, Doerte
Bruemann, Klaus
Doclo, Simon
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[5] SINGLE-MICROPHONE SPEECH ENHANCEMENT USING MVDR FILTERING AND WIENER POST-FILTERING
Fischer, Doerte
Gerkmann, Timo
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 201 - 205
[6] ROBUST MMSE FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
Enzner, Gerald
Thuene, Philipp
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4009 - 4013
[7] DEEP MULTI-FRAME MVDR FILTERING FOR BINAURAL NOISE REDUCTION
Tammen, Marvin
Doclo, Simon
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[8] ROBUST CONSTRAINED MFMVDR FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
Fischer, Doerte
Doclo, Simon
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 41 - 45
[9] DNN-BASED SPEECH PRESENCE PROBABILITY ESTIMATION FORMULTI-FRAME SINGLE-MICROPHONE SPEECH ENHANCEMENT
Tammen, Marvin
Fischer, Doerte
Meyer, Bernd T.
Doclo, Simon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 191 - 195
[10] Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
Lin, Shaoxiong
Zhang, Wangyou
Qian, Yanmin
APPLIED SCIENCES-BASEL, 2023, 13 (08):

← 1 2 3 4 →