Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

被引:1
|
作者
Tammen, Marvin [1 ]
Doclo, Simon
机构
[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26129 Oldenburg, Germany
关键词
Covariance matrices; Noise measurement; Speech enhancement; Interference; Estimation; Transforms; Filtering algorithms; Matrix structures; multi-frame filtering; MVDR filter; speech enhancement; supervised learning; SUBSPACE APPROACH; NOISE-REDUCTION; SEPARATION; NETWORKS;
D O I
10.1109/TASLP.2023.3306715
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Aiming at exploiting temporal correlations across consecutive time frames in the short-time Fourier transform (STFT) domain, multi-frame algorithms for single-microphone speech enhancement have been proposed. Typically, the multi-frame filter coefficients are either estimated directly using deep neural networks or a certain filter structure is imposed, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter structure. Recently, it was shown that integrating the fully differentiable MFMVDR filter into an end-to-end supervised learning framework employing temporal convolutional networks (TCNs) allows for a high estimation accuracy of the required parameters, i.e., the speech inter-frame correlation vector and the interference covariance matrix. In this paper, we investigate different covariance matrix structures, namely Hermitian positive-definite, Hermitian positive-definite Toeplitz, and rank-1. The main differences between the considered matrix structures lie in the number of parameters that need to be estimated by the TCNs as well as the required linear algebra operations. For example, assuming a rank-1 matrix structure, we show that the MFMVDR filter can be written as a linear combination of the TCN outputs, significantly reducing computational complexity. In addition, we consider a covariance matrix estimation procedure based on recursive smoothing. Experimental results on the deep noise suppression challenge dataset show that the estimation procedure using the Hermitian positive-definite matrix structure yields the best performance, closely followed by the rank-1 matrix structure at a much lower complexity. Furthermore, imposing the MFMVDR filter structure instead of directly estimating the multi-frame filter coefficients slightly but consistently improves the speech enhancement performance.
引用
收藏
页码:3237 / 3248
页数:12
相关论文
共 31 条
  • [21] A Single Channel Speech Enhancement Approach by Combining Statistical Criterion and Multi-Frame Sparse Dictionary Learning
    Tseng, Hung-Wei
    Vishnubhotla, Srikanth
    Hong, Mingyi
    Wang, Xiangfeng
    Xiao, Jinjun
    Luo, Zhi-Quan
    Zhang, Tao
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 451 - 455
  • [22] A New Algorithm for Noise PSD Matrix Estimation in Multi-Microphone Speech Enhancement Based on Recursive Smoothing
    Parchami, Mahdi
    Zhu, Wei-Ping
    Champagne, Benoit
    2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 429 - 432
  • [23] A SPEECH ENHANCEMENT ALGORITHM BY ITERATING SINGLE- AND MULTI-MICROPHONE PROCESSING AND ITS APPLICATION TO ROBUST ASR
    Zhang, Xueliang
    Wang, Zhong-Qiu
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 276 - 280
  • [24] Single Channel Speech Enhancement: using Wiener Filtering with Recursive Noise Estimation
    Upadhyay, Navneet
    Jaiswal, Rahul Kumar
    PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2015), 2016, 84 : 22 - 30
  • [25] Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement
    Kim, Minseung
    Cheong, Sein
    Song, Hyungchan
    Shin, Jong Won
    SENSORS, 2023, 23 (01)
  • [26] Multi-scale decomposition based supervised single channel deep speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    APPLIED SOFT COMPUTING, 2020, 95
  • [27] Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement
    Saleem, N.
    Khattak, M. I.
    Perez, E. V.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (12) : 1372 - 1382
  • [28] Spectral Phase Estimation Based on Deep Neural Networks for Single Channel Speech Enhancement
    N. Saleem
    M. I. Khattak
    E. V. Perez
    Journal of Communications Technology and Electronics, 2019, 64 : 1372 - 1382
  • [29] Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering [Special Issue On Model-Based and Data-Driven Audio Signal Processing]
    Heb-Umbach, Reinhold
    Nakatani, Tomohiro
    Delcroix, Marc
    Boeddeker, Christoph
    Ochiai, Tsubasa
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 12 - 23
  • [30] Multi-stage strength estimation network with cross attention for single channel speech enhancement
    Zhang, Zipeng
    Ding, Yuchen
    Chen, Wei
    Chen, Yutao
    Guo, Weiwei
    Liu, Houguang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6937 - 6948