Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction

被引:0
|
作者
Schwartz, Ayal [1 ,2 ]
Schwartz, Ofer [1 ]
Chazan, Shlomo E. [1 ,2 ]
Gannot, Sharon [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
[2] Origin AI, Ramat Gan, Israel
来源
关键词
LCMV beamforming; Relative transfer function estimation; DOA estimation; Speech activity detection; Multi-task deep learning; BLIND SOURCE SEPARATION; SPEECH ENHANCEMENT;
D O I
10.1186/s13636-024-00365-3
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer's building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multi-Microphone Noise Reduction Based on Orthogonal Noise Signal Decompositions
    Habets, Emanuel A. P.
    Benesty, Jacob
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1123 - 1133
  • [2] MULTI-MICROPHONE NOISE REDUCTION USING INTERCHANNEL AND INTERFRAME CORRELATIONS
    Habets, Emanuel A. P.
    Benesty, Jacob
    Chen, Jingdong
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 305 - 308
  • [3] IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES
    Koutrouvelis, Andreas I.
    Hendriks, Richard C.
    Jensen, Jesper
    Heusdens, Richard
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 460 - 464
  • [4] Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection
    Huang, Yiteng
    Shabestary, Turaj Z.
    Gruenstein, Alexander
    Wan, Li
    INTERSPEECH 2019, 2019, : 1233 - 1237
  • [5] PERCEPTUAL EFFECT OF REVERBERATION ON MULTI-MICROPHONE NOISE REDUCTION FOR COCHLEAR IMPLANTS
    Hersbach, Adam A.
    Grayden, David B.
    Fallon, James B.
    McDermott, Hugh J.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5853 - 5857
  • [6] Blind Multi-Sources Detection and Localization for Cognitive Radio
    Duval, O.
    Punchihewa, A.
    Gagnon, F.
    Despins, C.
    Bhargava, V. K.
    GLOBECOM 2008 - 2008 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, 2008,
  • [7] Principal subspace modification for multi-channel Wiener filter in multi-microphone noise reduction
    Kim, Gibak
    Cho, Nam Ik
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4909 - +
  • [8] A Recursive Expectation-Maximization Algorithm for Online Multi-Microphone Noise Reduction
    Schwartz, Ofer
    Gannot, Sharon
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1542 - 1546
  • [9] A Multi-Microphone Noise Reduction Approach Based on Beamforming and Signal Subspace Filter
    Dong Pengyu
    Tao, Lin
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 3, 2008, : 530 - 533
  • [10] A CONSTRAINED MAXIMUM LIKELIHOOD ESTIMATOR OF SPEECH AND NOISE SPECTRA WITH APPLICATION TO MULTI-MICROPHONE NOISE REDUCTION
    Zahedi, Adel
    Pedersen, Michael Syskind
    Ostergaard, Jan
    Bramslow, Lars
    Christiansen, Thomas Ulrich
    Jensen, Jesper
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6944 - 6948