Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

被引:25
|
作者
Chen, Xiaoyi [1 ]
Wang, Wenwu [2 ]
Wang, Yingmin [1 ]
Zhong, Xionghu [3 ]
Alinaghi, Atiyeh [2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Dept Acoust Engn, Xian 710072, Peoples R China
[2] Univ Surrey, Dept Elect Engn, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[3] Nanyang Technol Univ, Coll Engn, Sch Comp Engn, Singapore 639798, Singapore
关键词
B-format signal; Acoustic intensity; Expectation-maximization (EM) algorithm; Blind source separation (BSS); Direction of arrival (DOA); BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; OF-ARRIVAL ESTIMATION; CONVOLUTIVE MIXTURES; ALGORITHMS; ROBUST; ICA;
D O I
10.1016/j.specom.2015.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 54
页数:14
相关论文
共 50 条
  • [31] Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming
    Bella, Mostafa
    Saylani, Hicham
    Hosseini, Shahram
    Deville, Yannick
    IEEE ACCESS, 2023, 11 : 100632 - 100645
  • [32] Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention
    Zhao, Yan
    Wang, DeLiang
    INTERSPEECH 2020, 2020, : 3261 - 3265
  • [33] SOFT MASKING BASED ADAPTATION FOR TIME-FREQUENCY BEAMFORMERS UNDER REVERBERANT AND BACKGROUND NOISE ENVIRONMENTS
    Kawaguchi, Yohei
    Togami, Masahito
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 736 - 740
  • [34] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
    Soni, Meet
    Panda, Ashish
    INTERSPEECH 2019, 2019, : 426 - 430
  • [35] Speech intelligibility in background noise with ideal binary time-frequency masking
    Wang, DeLiang
    Kjems, Ulrik
    Pedersen, Michael S.
    Boldt, Jesper B.
    Lunner, Thomas
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (04): : 2336 - 2347
  • [36] Perceptual effects of noise reduction by time-frequency masking of noisy speech
    Brons, Inge
    Houben, Rolph
    Dreschler, Wouter A.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04): : 2690 - 2699
  • [37] Review of Time-Frequency Masking Approach for Improving Speech Intelligibility in Noise
    Kim, Gibak
    IETE TECHNICAL REVIEW, 2022, 39 (03) : 623 - 634
  • [38] Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    Journal of the Acoustical Society of America, 2006, 120 (06): : 4007 - 4018
  • [39] Continuous Time-Frequency Masking Method for Blind Speech Separation with Adaptive Choice of Threshold Parameter Using ICA
    Koldovsky, Zbynek
    Nouza, Jan
    Kolorenc, Jan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2578 - 2581
  • [40] Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
    Kolossa, D
    Klimas, A
    Orglmeister, R
    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 82 - 85