Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors

被引：1

作者：

Fras, Mieszko ^{[1
]}

Kowalczyk, Konrad ^{[1
]}

机构：

[1] AGH Univ Krakow, Inst Elect, Fac Comp Sci Elect & Telecommun, PL-30059 Krakow, Poland

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

array signal processing; convolutive nonnegative matrix factorization; room reverberation; Source separation; NONNEGATIVE MATRIX FACTORIZATION; DEREVERBERATION; QUALITY;

D O I：

10.1109/TASLP.2024.3374065

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech signals recorded by distant microphones are often contaminated with room reverberation and signals of interfering speakers. This article addresses the problem of joint source separation and dereverberation using multichannel nonnegative tensor factorization (NTF) in which late reverberant components are modeled using the so-called delayed subsources. The article formulates two distinct signal models of the time-frequency spectrum of the multichannel microphone mixture, in which reverberation is modeled either independently for each source using delayed source variances or jointly using delayed microphone signals. In addition, it defines computationally efficient variants of these two methods with a simplified spatial model in which spatial properties of the late reverberant components are estimated jointly for all delays. For each of the four distinct algorithms, the article first formulates a maximum a posteriori (MaP) estimator based on the NTF model with the localization prior over the mixing matrix that is suitable for the estimation of the early reverberation (primarily the direct-path) signals in a reverberant environment. Next it derives update equations for the four resulting expectation-maximization algorithms, which are thoroughly evaluated and shown to outperform similar state-of-the-art approaches. The results of experimental evaluations, performed using real and simulated data, for determined, over-determined and under-determined scenarios, indicate superior performance of the proposed processing over state-of-the-art in terms of standard source separation and dereverberation metrics.

引用

页码：1954 / 1967

页数：14

共 42 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[2] On multiplicative transfer function approximation in the short-time Fourier transform domain [J].

Avargel, Yekutiel ;

Cohen, Israel .

IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (05) :337-340

[3] The third 'CHIME' speech separation and recognition challenge: Analysis and outcomes [J].

Barker, Jon ;

Marxer, Ricard ;

Vincent, Emmanuel ;

Watanabe, Shinji .

COMPUTER SPEECH AND LANGUAGE, 2017, 46 :605-626

[4]

Choi S., 2005, Neural Inf. Process. Lett. Rev., V6, P1

[5] Nonnegative matrix and tensor factorization [J].

Cichocki, Andrzej ;

Zdunek, Rafal ;

Amari, Shun-Ichi .

IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (01) :142-145

[6]

DiBiase JH, 2001, DIGITAL SIGNAL PROC, P157

[7] Multichannel Signal Enhancement Algorithms for Assisted Listening Devices [J].

Doclo, Simon ;

Kellermann, Walter ;

Makino, Shoji ;

Nordholm, Sven .

IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :18-30

[8] Spatial location priors for Gaussian model based reverberant audio source separation [J].

Duong, Ngoc Q. K. ;

Vincent, Emmanuel ;

Gribonval, Remi .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2013,

[9] Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].

Duong, Ngoc Q. K. ;

Vincent, Emmanuel ;

Gribonval, Remi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840

[10] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech [J].

Falk, Tiago H. ;

Zheng, Chenxi ;

Chan, Wai-Yip .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1766-1774

← 1 2 3 4 5 →