Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

被引:25
|
作者
Chen, Xiaoyi [1 ]
Wang, Wenwu [2 ]
Wang, Yingmin [1 ]
Zhong, Xionghu [3 ]
Alinaghi, Atiyeh [2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Dept Acoust Engn, Xian 710072, Peoples R China
[2] Univ Surrey, Dept Elect Engn, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[3] Nanyang Technol Univ, Coll Engn, Sch Comp Engn, Singapore 639798, Singapore
关键词
B-format signal; Acoustic intensity; Expectation-maximization (EM) algorithm; Blind source separation (BSS); Direction of arrival (DOA); BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; OF-ARRIVAL ESTIMATION; CONVOLUTIVE MIXTURES; ALGORITHMS; ROBUST; ICA;
D O I
10.1016/j.specom.2015.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 54
页数:14
相关论文
共 50 条
  • [21] Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking
    Pertila, P.
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 683 - 702
  • [22] An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation
    Zheng, Chenxi
    Falk, Tiago H.
    Chan, Wai-Yip
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 212 - +
  • [23] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [24] Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
    Williamson, Donald S.
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1492 - 1501
  • [25] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [26] CONVERTING 5.1 AUDIO RECORDINGS TO B-FORMAT FOR DIRECTIONAL AUDIO CODING REPRODUCTION
    Laitinen, Mikko-Ville
    Pulkki, Ville
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 61 - 64
  • [27] Separation of Cardiorespiratory Sounds Using Time-Frequency Masking and Sparsity
    Shah, Ghafoor
    Papadias, Constantinos
    2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
  • [28] Musical Sound Separation Based on Binary Time-Frequency Masking
    Yipeng Li
    DeLiang Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [29] Musical Sound Separation Based on Binary Time-Frequency Masking
    Li, Yipeng
    Wang, DeLiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [30] A feature study for masking-based reverberant speech separation
    Delfarah, Masood
    Wang, DeLiang
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 555 - 559