Blind Speech Separation with GCC-NMF

被引:5
作者
Wood, Sean U. N. [1 ]
Rouat, Jean [1 ]
机构
[1] Univ Sherbrooke, GEGI, NECOTIS, Sherbrooke, PQ, Canada
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
cocktail party problem; blind speech separation; interaural time difference; NMF; GCC; PRAT; CASA; AUDIO SOURCE SEPARATION; NONNEGATIVE MATRIX FACTORIZATION; INFORMATION; MODELS;
D O I
10.21437/Interspeech.2016-1449
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce a blind source separation algorithm named GCCNMF that combines unsupervised dictionary learning via non negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, over time, according to their spatial origins. Separation quality is evaluated using publicly available data from the SiSEC signal separation evaluation campaign consisting of stereo recordings of 3 and 4 concurrent speakers in reverberant environments. Performance is quantified using perceptual and SNRbased measures with the PEASS and BSS Eval toolkits, respectively. We compare our approach with other NMF-based speech separation algorithms including unsupervised and semi supervised approaches. GCC-NMF outperforms the unsupervised model-based approach that combines NMF with spatial covariance mixture models, and compares favourably to semi supervised approaches that leverage prior knowledge and information, despite being purely unsupervised itself.
引用
收藏
页码:3329 / 3333
页数:5
相关论文
共 35 条
  • [1] Adiloglu K., 2012, THESIS
  • [2] Anguera X, 2006, ROB SPEAK DIAR M U P
  • [3] [Anonymous], P IEEE INT WORKSH MA
  • [4] [Anonymous], 2015, SPARSE NMF HALF BAKE
  • [5] Araki Shoko, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P414, DOI 10.1007/978-3-642-28551-6_51
  • [6] Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
  • [7] BeimingWang Mark D, 2005, Proc. DMRN Summer Conf, P23
  • [8] Multi-source TDOA estimation in reverberant audio using angular spectra and clustering
    Blandin, Charles
    Ozerov, Alexey
    Vincent, Emmanuel
    [J]. SIGNAL PROCESSING, 2012, 92 (08) : 1950 - 1960
  • [9] Cauchi B., 2016, P AES 60 C DER REV A