Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

被引:25
|
作者
Chen, Xiaoyi [1 ]
Wang, Wenwu [2 ]
Wang, Yingmin [1 ]
Zhong, Xionghu [3 ]
Alinaghi, Atiyeh [2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Dept Acoust Engn, Xian 710072, Peoples R China
[2] Univ Surrey, Dept Elect Engn, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[3] Nanyang Technol Univ, Coll Engn, Sch Comp Engn, Singapore 639798, Singapore
关键词
B-format signal; Acoustic intensity; Expectation-maximization (EM) algorithm; Blind source separation (BSS); Direction of arrival (DOA); BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; OF-ARRIVAL ESTIMATION; CONVOLUTIVE MIXTURES; ALGORITHMS; ROBUST; ICA;
D O I
10.1016/j.specom.2015.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 54
页数:14
相关论文
共 50 条
  • [1] ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING
    Zhong, Xionghu
    Chen, Xiaoyi
    Wang, Wenwu
    Alinaghi, Atiyeh
    Premkumar, A. B.
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [2] SPATIAL AND COHERENCE CUES BASED TIME-FREQUENCY MASKING FOR BINAURAL REVERBERANT SPEECH SEPARATION
    Alinaghi, Atiyeh
    Wang, Wenwu
    Jackson, Philip J. B.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 684 - 688
  • [3] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [4] The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech
    Madmoni, Lior
    Tibor, Shir
    Nelken, Israel
    Rafaely, Boaz
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2037 - 2047
  • [5] Blind separation of speech mixtures via time-frequency masking
    Yilmaz, Ö
    Rickard, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
  • [6] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [7] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
    Jafari, Ingrid
    Haque, Serajul
    Togneri, Roberto
    Nordholm, Sven
    2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
  • [8] Blind speech source separation via nonlinear time-frequency masking
    Xu, Shun
    Chen, Shaorong
    Liu, Yulin
    Shengxue Xuebao/Acta Acustica, 2007, 32 (04): : 375 - 381
  • [9] Blind speech source separation via nonlinear time-frequency masking
    XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
    ChineseJournalofAcoustics, 2008, (03) : 203 - 214
  • [10] Loudspeaker localization using B-format recordings
    Gunel, B
    2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 59 - 62