Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement

被引：4

作者：

Kim, Hansol ^{[1
]}

Kang, Kyeongmuk ^{[1
]}

Shin, Jong Won ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

新加坡国家研究基金会;

关键词：

Speech enhancement; Estimation; Artificial neural networks; MISO communication; Array signal processing; Deep learning; Microphone arrays; Multi-channel speech enhancement; deep learning-based beamforming; factorized MVDR beamformer; NEURAL-NETWORK; SEPARATION; ATTENTION;

D O I：

10.1109/LSP.2022.3200581

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Traditionally, adaptive beamformers such as the minimum-variance distortionless response (MVDR) beamformer and generalized eigenvalue beamformer have been widely used for multi-channel speech enhancement with a single-channel postfilter. Recently, several approaches have been proposed to enhance the signals used to estimate speech and noise spatial covariance matrices (SCMs) and process the outputs of the beamformers using deep neural networks (DNNs). However, the preprocessing of the signals for SCMs estimation may disrupt phase relations among input signals and the time-averages used to estimate speech and noise SCMs may not be optimal for beamformer performance even though the estimated signals are close to the ground truth. In this letter, we propose a deep beamforming approach which estimates factors of the MVDR beamformer using a DNN to circumvent the difficulty of the speech and noise SCM estimation. We formulate the MVDR beamformer as a factorized form related to two complex factors and estimate them using a DNN with a cost function comparing beamformed signal and the original clean speech. Experimental results showed that the proposed factorized MVDR beamformer could mimic the characteristics of the MVDR beamformer with true relative transfer function and noise SCM and outperformed the MVDR beamformer with deep learning-based pre- and post-processing in terms of the perceptual evaluation of speech quality scores.

引用

页码：1898 / 1902

页数：5

共 50 条

[41] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
Moritz, Niko
Adiloglu, Kamil
Anemueller, Joern
Goetze, Stefan
Kollmeier, Birger
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
[42] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
Chau, Hoang Ngoc
Bui, Tien Dat
Nguyen, Huu Binh
Duong, Thanh Thi Hien
Nguyen, Quoc Cuong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
[43] A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL
Chung, Hanwook
Plourde, Eric
Champagne, Benoit
2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 221 - 225
[44] Multi-scale decomposition based supervised single channel deep speech enhancement
Saleem, Nasir
Khattak, Muhammad Irfan
APPLIED SOFT COMPUTING, 2020, 95
[45] ADAPTATION MODE CONTROL WITH RESIDUAL NOISE ESTIMATION FOR BEAMFORMER-BASED MULTI-CHANNEL SPEECH ENHANCEMENT
Kim, Seon Man
Kim, Hong Kook
Lee, Sung Joo
Lee, Yun Keun
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 301 - 304
[46] UTILIZING HEAD ROTATION DATA IN DNN-BASED MULTI-CHANNEL SPEECH ENHANCEMENT FOR HEARING AIDS
Lentz, Benjamin
Martin, Rainer
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 165 - 169
[47] A Phase-Based Time-Frequency masking for multi-channel speech enhancement in domestic environments
Brutti, Alessio
Tsiami, Antigoni
Katsamanis, Athanasios
Maragos, Petros
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2875 - 2879
[48] Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter
Bagheri, Saeed
Giacobello, Daniele
INTERSPEECH 2019, 2019, : 101 - 105
[49] MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement
Cheng, Rui
Bao, Changchun
Cui, Zihao
APPLIED SCIENCES-BASEL, 2020, 10 (04):
[50] Multi-channel speech enhancement using early and late fusion convolutional neural networks
Priyanka, S. Siva
Kumar, T. Kishore
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979

← 1 2 3 4 5 →