Single Channel Source Separation with General Stochastic Networks

被引:0
作者
Zoehrer, Matthias [1 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Graz, Austria
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
基金
奥地利科学基金会;
关键词
general stochastic network; speech separation; speech enhancement; single channel source separation; SPEECH; NOISE; INTELLIGIBILITY; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single channel source separation (SCSS) is ill-posed and thus challenging. In this paper, we apply general stochastic networks (GSNs) - a deep neural network architecture to SCSS. We extend GSNs to be capable of predicting a time-frequency representation, i.e. softmask by introducing a hybrid generative-discriminative training objective to the network. We evaluate GSNs on data of the 2nd CHiME speech separation challenge. In particular, we provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. Empirically, we compare to other deep architectures, namely a deep belief network (DBN) and a multi-layer perceptron (MLP). In general, deep architectures perform well on SCSS tasks.
引用
收藏
页码:978 / 982
页数:5
相关论文
共 30 条
  • [11] Dahl G., 2010, Advances in neural information processing systems, V23, P469
  • [12] Deng L, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1692
  • [13] Garofolo J. S., 1993, TIMIT ACOUSTIC PHONE, DOI DOI 10.35111/17GK-BN40
  • [14] Gosset W.S., 1908, Biometrika, V6, P1
  • [15] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [16] An algorithm that improves speech intelligibility in noise for normal-hearing listeners
    Kim, Gibak
    Lu, Yang
    Hu, Yi
    Loizou, Philipos C.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (03) : 1486 - 1494
  • [17] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [18] Lee H., 2007, Adv Neural Inform Process Syst, P801, DOI DOI 10.5555/2976456.2976557
  • [19] Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
    Li, Ning
    Loizou, Philipos C.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (03) : 1673 - 1682
  • [20] Nair V., 2010, P 27 INT C MACH LEAR, P807