Blind Speech Separation with GCC-NMF

被引：5

作者：

Wood, Sean U. N. ^{[1
]}

Rouat, Jean ^{[1
]}

机构：

[1] Univ Sherbrooke, GEGI, NECOTIS, Sherbrooke, PQ, Canada

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

cocktail party problem; blind speech separation; interaural time difference; NMF; GCC; PRAT; CASA; AUDIO SOURCE SEPARATION; NONNEGATIVE MATRIX FACTORIZATION; INFORMATION; MODELS;

D O I：

10.21437/Interspeech.2016-1449

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce a blind source separation algorithm named GCCNMF that combines unsupervised dictionary learning via non negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, over time, according to their spatial origins. Separation quality is evaluated using publicly available data from the SiSEC signal separation evaluation campaign consisting of stereo recordings of 3 and 4 concurrent speakers in reverberant environments. Performance is quantified using perceptual and SNRbased measures with the PEASS and BSS Eval toolkits, respectively. We compare our approach with other NMF-based speech separation algorithms including unsupervised and semi supervised approaches. GCC-NMF outperforms the unsupervised model-based approach that combines NMF with spatial covariance mixture models, and compares favourably to semi supervised approaches that leverage prior knowledge and information, despite being purely unsupervised itself.

引用

页码：3329 / 3333

页数：5

共 35 条

[1] Adiloglu K., 2012, THESIS
[2] Anguera X, 2006, ROB SPEAK DIAR M U P
[3] [Anonymous], P IEEE INT WORKSH MA
[4] [Anonymous], 2015, SPARSE NMF HALF BAKE
[5] Araki Shoko, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P414, DOI 10.1007/978-3-642-28551-6_51
[6] Arberet S., 2010, 2010 10th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA 2010), P1, DOI 10.1109/ISSPA.2010.5605570
[7] BeimingWang Mark D, 2005, Proc. DMRN Summer Conf, P23
[8] Multi-source TDOA estimation in reverberant audio using angular spectra and clustering
Blandin, Charles
Ozerov, Alexey
Vincent, Emmanuel
[J]. SIGNAL PROCESSING, 2012, 92 (08) : 1950 - 1960
[9] Cauchi B., 2016, P AES 60 C DER REV A
[10] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS
CHERRY, EC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) : 975 - 979

← 1 2 3 4 →