Joint dereverberation and blind source separation using a hybrid autoregressive and convolutive transfer function-based model

被引：0

作者：

Liu, Shengdong ^{[1
,2
]}

Yang, Feiran ^{[2
,3
]}

Chen, Rilin ^{[4
]}

Yang, Jun ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, State Key Lab Acoust, Inst Acoust, Beijing 100190, Peoples R China

[4] Tencent AI Lab, Beijing 100080, Peoples R China

来源：

APPLIED ACOUSTICS | 2024年 / 224卷

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Convolutive transfer function; Autoregressive; Dereverberation; Blind source separation; Multichannel non-negative matrix factorization; NONNEGATIVE MATRIX FACTORIZATION; MIXTURES; DOMAIN; IDENTIFICATION; NOISE;

D O I：

10.1016/j.apacoust.2024.110135

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most frequency-domain blind source separation (BSS) methods are based on the multiplicative narrowband assumption, which is not valid in long reverberation environments. In contrast, convolutive transfer function (CTF)-based BSS methods do not rely on the narrowband assumption, and the separation performance is significantly improved compared to the traditional algorithms in long reverberation environments. However, the CTF-based BSS methods and their variants, e.g., autoregressive (AR) BSS methods, introduce modeling errors to some extent, due to the truncation or approximation during the optimization process. To address this problem, we propose a frequency-domain BSS method employing a hybrid AR and CTF model, which can provide more precise representations of the early reflections and late reverberations. Furthermore, we utilize the Gaussian noise model to deal with the BSS problem in noisy reverberant environments. We formulate the objective function using the maximum log-likelihood criterion, and derive an efficient iterative algorithm for parameter estimation with the block coordinate descent (BCD) method. Experimental results show that the proposed method has a better separation performance than the existing methods in long reverberation environments.

引用

页数：10

共 43 条

[1] The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech
Araki, S
Mukai, R
Makino, S
Nishikawa, T
Saruwatari, H
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (02): : 109 - 116
[2] On multiplicative transfer function approximation in the short-time Fourier transform domain
Avargel, Yekutiel
Cohen, Israel
[J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (05) : 337 - 340
[3] System identification in the short-time Fourier transform domain with crossband filtering
Avargel, Yekutiel
Cohen, Israel
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1305 - 1319
[4] Baevski A, 2020, ADV NEUR IN, V33
[5] Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation
Cheng, Guoliang
Liao, Lele
Chen, Kai
Hu, Yuxiang
Zhu, Changbao
Lu, Jing
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (01) : 88 - 95
[6] Comon P, 2010, HANDBOOK OF BLIND SOURCE SEPARATION: INDEPENDENT COMPONENT ANALYSIS AND APPLICATIONS, P1
[7] Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model
Duong, Ngoc Q. K.
Vincent, Emmanuel
Gribonval, Remi
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1830 - 1840
[8] Underdetermined Reverberant Blind Source Separation: Sparse Approaches for Multiplicative and Convolutive Narrowband Approximation
Feng, Fangchen
Kowalski, Mathieu
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (02) : 442 - 456
[9] Garofolo John S, 1993, NIST speech disc 1-1.1
[10] Grosman J, 2021, Fine-tuned XLSR-53 large model for speech recognition in English

← 1 2 3 4 5 →