Single Channel Source Separation with General Stochastic Networks

被引：0

作者：

Zoehrer, Matthias ^{[1
]}

Pernkopf, Franz ^{[1
]}

机构：

[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Graz, Austria

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

基金：

奥地利科学基金会;

关键词：

general stochastic network; speech separation; speech enhancement; single channel source separation; SPEECH; NOISE; INTELLIGIBILITY; ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Single channel source separation (SCSS) is ill-posed and thus challenging. In this paper, we apply general stochastic networks (GSNs) - a deep neural network architecture to SCSS. We extend GSNs to be capable of predicting a time-frequency representation, i.e. softmask by introducing a hybrid generative-discriminative training objective to the network. We evaluate GSNs on data of the 2nd CHiME speech separation challenge. In particular, we provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. Empirically, we compare to other deep architectures, namely a deep belief network (DBN) and a multi-layer perceptron (MLP). In general, deep architectures perform well on SCSS tasks.

引用

页码：978 / 982

页数：5

共 30 条

[11] Dahl G., 2010, Advances in neural information processing systems, V23, P469
[12] Deng L, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1692
[13] Garofolo J. S., 1993, TIMIT ACOUSTIC PHONE, DOI DOI 10.35111/17GK-BN40
[14] Gosset W.S., 1908, Biometrika, V6, P1
[15] A fast learning algorithm for deep belief nets
Hinton, Geoffrey E.
Osindero, Simon
Teh, Yee-Whye
[J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
[16] An algorithm that improves speech intelligibility in noise for normal-hearing listeners
Kim, Gibak
Lu, Yang
Hu, Yi
Loizou, Philipos C.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (03) : 1486 - 1494
[17] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[18] Lee H., 2007, Adv Neural Inform Process Syst, P801, DOI DOI 10.5555/2976456.2976557
[19] Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
Li, Ning
Loizou, Philipos C.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (03) : 1673 - 1682
[20] Nair V., 2010, P 27 INT C MACH LEAR, P807

← 1 2 3 →