Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

被引：1

作者：

Tammen, Marvin ^{[1
]}

Doclo, Simon

机构：

[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Covariance matrices; Noise measurement; Speech enhancement; Interference; Estimation; Transforms; Filtering algorithms; Matrix structures; multi-frame filtering; MVDR filter; speech enhancement; supervised learning; SUBSPACE APPROACH; NOISE-REDUCTION; SEPARATION; NETWORKS;

D O I：

10.1109/TASLP.2023.3306715

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Aiming at exploiting temporal correlations across consecutive time frames in the short-time Fourier transform (STFT) domain, multi-frame algorithms for single-microphone speech enhancement have been proposed. Typically, the multi-frame filter coefficients are either estimated directly using deep neural networks or a certain filter structure is imposed, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter structure. Recently, it was shown that integrating the fully differentiable MFMVDR filter into an end-to-end supervised learning framework employing temporal convolutional networks (TCNs) allows for a high estimation accuracy of the required parameters, i.e., the speech inter-frame correlation vector and the interference covariance matrix. In this paper, we investigate different covariance matrix structures, namely Hermitian positive-definite, Hermitian positive-definite Toeplitz, and rank-1. The main differences between the considered matrix structures lie in the number of parameters that need to be estimated by the TCNs as well as the required linear algebra operations. For example, assuming a rank-1 matrix structure, we show that the MFMVDR filter can be written as a linear combination of the TCN outputs, significantly reducing computational complexity. In addition, we consider a covariance matrix estimation procedure based on recursive smoothing. Experimental results on the deep noise suppression challenge dataset show that the estimation procedure using the Hermitian positive-definite matrix structure yields the best performance, closely followed by the rank-1 matrix structure at a much lower complexity. Furthermore, imposing the MFMVDR filter structure instead of directly estimating the multi-frame filter coefficients slightly but consistently improves the speech enhancement performance.

引用

页码：3237 / 3248

页数：12

共 31 条

[21] A Single Channel Speech Enhancement Approach by Combining Statistical Criterion and Multi-Frame Sparse Dictionary Learning
Tseng, Hung-Wei
Vishnubhotla, Srikanth
Hong, Mingyi
Wang, Xiangfeng
Xiao, Jinjun
Luo, Zhi-Quan
Zhang, Tao
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 451 - 455
[22] A New Algorithm for Noise PSD Matrix Estimation in Multi-Microphone Speech Enhancement Based on Recursive Smoothing
Parchami, Mahdi
Zhu, Wei-Ping
Champagne, Benoit
2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 429 - 432
[23] A SPEECH ENHANCEMENT ALGORITHM BY ITERATING SINGLE- AND MULTI-MICROPHONE PROCESSING AND ITS APPLICATION TO ROBUST ASR
Zhang, Xueliang
Wang, Zhong-Qiu
Wang, DeLiang
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 276 - 280
[24] Single Channel Speech Enhancement: using Wiener Filtering with Recursive Noise Estimation
Upadhyay, Navneet
Jaiswal, Rahul Kumar
PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2015), 2016, 84 : 22 - 30
[25] Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement
Kim, Minseung
Cheong, Sein
Song, Hyungchan
Shin, Jong Won
SENSORS, 2023, 23 (01)
[26] Multi-scale decomposition based supervised single channel deep speech enhancement
Saleem, Nasir
Khattak, Muhammad Irfan
APPLIED SOFT COMPUTING, 2020, 95
[27] Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement
Saleem, N.
Khattak, M. I.
Perez, E. V.
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (12) : 1372 - 1382
[28] Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement
N. Saleem
M. I. Khattak
E. V. Perez
Journal of Communications Technology and Electronics, 2019, 64 : 1372 - 1382
[29] Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering [Special Issue On Model-Based and Data-Driven Audio Signal Processing]
Heb-Umbach, Reinhold
Nakatani, Tomohiro
Delcroix, Marc
Boeddeker, Christoph
Ochiai, Tsubasa
IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 12 - 23
[30] Multi-stage strength estimation network with cross attention for single channel speech enhancement
Zhang, Zipeng
Ding, Yuchen
Chen, Wei
Chen, Yutao
Guo, Weiwei
Liu, Houguang
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6937 - 6948

← 1 2 3 4 →