Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

被引:25
作者
Chen, Xiaoyi [1 ]
Wang, Wenwu [2 ]
Wang, Yingmin [1 ]
Zhong, Xionghu [3 ]
Alinaghi, Atiyeh [2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Dept Acoust Engn, Xian 710072, Peoples R China
[2] Univ Surrey, Dept Elect Engn, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[3] Nanyang Technol Univ, Coll Engn, Sch Comp Engn, Singapore 639798, Singapore
关键词
B-format signal; Acoustic intensity; Expectation-maximization (EM) algorithm; Blind source separation (BSS); Direction of arrival (DOA); BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; OF-ARRIVAL ESTIMATION; CONVOLUTIVE MIXTURES; ALGORITHMS; ROBUST; ICA;
D O I
10.1016/j.specom.2015.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 54
页数:14
相关论文
共 50 条
  • [41] Blind separation of frequency-hopping signals based on time-frequency distribution
    Feng T.
    Yuan C.-W.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2010, 32 (05): : 900 - 903
  • [42] ONLINE BLIND SOURCE SEPARATION BASED ON TIME-FREQUENCY SPARSENESS
    Loesch, Benedikt
    Yang, Bin
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 117 - 120
  • [43] Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points
    Jia, Maoshen
    Wu, Yuxuan
    Bao, Changchun
    Ritz, Christian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 379 - 392
  • [44] UNDERDETERMINED SOURCE SEPARATION USING TIME-FREQUENCY MASKS AND AN ADAPTIVE COMBINED GAUSSIAN-STUDENT'S T PROBABILISTIC MODEL
    Sun, Yang
    Rafique, Waqas
    Chambers, Jonathan A.
    Naqvi, Syed Mohsen
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4187 - 4191
  • [45] Blind source separation of acoustic mixtures using time-frequency domain independent component analysis
    Jayaraman, S
    Sitaraman, G
    Seshadri, R
    ICCS 2002: 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2002, : 1016 - 1019
  • [46] Two contributions to blind source separation using time-frequency distributions
    Févotte, C
    Doncarli, C
    IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (03) : 386 - 389
  • [47] Blind separation of underdetermined convolutive mixtures using their time-frequency representation
    Aissa-El-Bey, Abdeldjalil
    Abed-Meraim, Karim
    Grenier, Yves
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1540 - 1550
  • [48] Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation
    Yang, Ning
    Usman, Muhammad
    He, Xiangjian
    Jan, Mian Ahmad
    Zhang, Liming
    IEEE ACCESS, 2017, 5 : 27114 - 27125
  • [49] A combined approach using subspace and beamforming methods for time-frequency domain blind source separation
    Ichijo, Akihiro
    Hamada, Takehiro
    Tabaru, Tetsuya
    Nakano, Kazushi
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 1437 - +
  • [50] Blind deconvolution of speech mixtures based on time-frequency processing and statistical analysis
    Hu, K
    Wang, ZF
    INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 278 - 281