Joint dereverberation and blind source separation using a hybrid autoregressive and convolutive transfer function-based model

被引:0
作者
Liu, Shengdong [1 ,2 ]
Yang, Feiran [2 ,3 ]
Chen, Rilin [4 ]
Yang, Jun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, State Key Lab Acoust, Inst Acoust, Beijing 100190, Peoples R China
[4] Tencent AI Lab, Beijing 100080, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Convolutive transfer function; Autoregressive; Dereverberation; Blind source separation; Multichannel non-negative matrix factorization; NONNEGATIVE MATRIX FACTORIZATION; MIXTURES; DOMAIN; IDENTIFICATION; NOISE;
D O I
10.1016/j.apacoust.2024.110135
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most frequency-domain blind source separation (BSS) methods are based on the multiplicative narrowband assumption, which is not valid in long reverberation environments. In contrast, convolutive transfer function (CTF)-based BSS methods do not rely on the narrowband assumption, and the separation performance is significantly improved compared to the traditional algorithms in long reverberation environments. However, the CTF-based BSS methods and their variants, e.g., autoregressive (AR) BSS methods, introduce modeling errors to some extent, due to the truncation or approximation during the optimization process. To address this problem, we propose a frequency-domain BSS method employing a hybrid AR and CTF model, which can provide more precise representations of the early reflections and late reverberations. Furthermore, we utilize the Gaussian noise model to deal with the BSS problem in noisy reverberant environments. We formulate the objective function using the maximum log-likelihood criterion, and derive an efficient iterative algorithm for parameter estimation with the block coordinate descent (BCD) method. Experimental results show that the proposed method has a better separation performance than the existing methods in long reverberation environments.
引用
收藏
页数:10
相关论文
共 43 条
  • [1] The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech
    Araki, S
    Mukai, R
    Makino, S
    Nishikawa, T
    Saruwatari, H
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (02): : 109 - 116
  • [2] On multiplicative transfer function approximation in the short-time Fourier transform domain
    Avargel, Yekutiel
    Cohen, Israel
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (05) : 337 - 340
  • [3] System identification in the short-time Fourier transform domain with crossband filtering
    Avargel, Yekutiel
    Cohen, Israel
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1305 - 1319
  • [4] Baevski A, 2020, ADV NEUR IN, V33
  • [5] Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation
    Cheng, Guoliang
    Liao, Lele
    Chen, Kai
    Hu, Yuxiang
    Zhu, Changbao
    Lu, Jing
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (01) : 88 - 95
  • [6] Comon P, 2010, HANDBOOK OF BLIND SOURCE SEPARATION: INDEPENDENT COMPONENT ANALYSIS AND APPLICATIONS, P1
  • [7] Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model
    Duong, Ngoc Q. K.
    Vincent, Emmanuel
    Gribonval, Remi
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1830 - 1840
  • [8] Underdetermined Reverberant Blind Source Separation: Sparse Approaches for Multiplicative and Convolutive Narrowband Approximation
    Feng, Fangchen
    Kowalski, Mathieu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (02) : 442 - 456
  • [9] Garofolo John S, 1993, NIST speech disc 1-1.1
  • [10] Grosman J, 2021, Fine-tuned XLSR-53 large model for speech recognition in English