Reverberant speech separation with probabilistic time-frequency masking for B-format recordings

被引:25
|
作者
Chen, Xiaoyi [1 ]
Wang, Wenwu [2 ]
Wang, Yingmin [1 ]
Zhong, Xionghu [3 ]
Alinaghi, Atiyeh [2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Dept Acoust Engn, Xian 710072, Peoples R China
[2] Univ Surrey, Dept Elect Engn, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
[3] Nanyang Technol Univ, Coll Engn, Sch Comp Engn, Singapore 639798, Singapore
关键词
B-format signal; Acoustic intensity; Expectation-maximization (EM) algorithm; Blind source separation (BSS); Direction of arrival (DOA); BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; OF-ARRIVAL ESTIMATION; CONVOLUTIVE MIXTURES; ALGORITHMS; ROBUST; ICA;
D O I
10.1016/j.specom.2015.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:41 / 54
页数:14
相关论文
共 50 条
  • [41] Blind separation of underdetermined Convolutive speech mixtures by time-frequency masking with the reduction of musical noise of separated signals
    Zohrevandi, Mahbanou
    Setayeshi, Saeed
    Rabiee, Azam
    Reshadi, Midia
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 12601 - 12618
  • [42] Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking
    Reju, Vaninirappuputhenpurayil Gopalan
    Koh, Soo Ngee
    Soon, Ing Yann
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01): : 101 - 116
  • [43] Sound Source Separation by Using Matched Beamforming and Time-Frequency Masking
    Beh, Jounghoon
    Lee, Taekjin
    Han, David
    Ko, Hanseok
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010,
  • [44] Time-frequency masking for blind source separation with preserved spatial cues
    Pirhosseinloo, Shadi
    Kokkinakis, Kostas
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1188 - 1192
  • [45] Features for Masking-Based Monaural Speech Separation in Reverberant Conditions
    Delfarah, Masood
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1085 - 1094
  • [46] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Barnard, Mark
    Kittler, Josef
    Chambers, Jonathon
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535
  • [47] Parameter tuning of time-frequency masking algorithms for reverberant artifact removal within the cochlear implant stimulus
    Shahidi, Lidea K.
    Collins, Leslie M.
    Mainsah, Boyla O.
    COCHLEAR IMPLANTS INTERNATIONAL, 2022, 23 (06) : 309 - 316
  • [48] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06): : 4007 - 4018
  • [49] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [50] The Application of Time-Frequency Masking To Improve Intelligibility of Dysarthric Speech in Background Noise
    Borrie, Stephanie A.
    Yoho, Sarah E.
    Healy, Eric W.
    Barrett, Tyson S.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2023, 66 (05): : 1853 - 1866