Towards Scaling Up Classification-Based Speech Separation

被引:376
作者
Wang, Yuxuan [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 07期
关键词
Computational auditory scene analysis (CASA); deep belief networks; feature learning; monaural speech separation; support vector machines; NOISE; INTELLIGIBILITY; SEGREGATION; ALGORITHM;
D O I
10.1109/TASL.2013.2250961
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Formulating speech separation as a binary classification problem has been shown to be effective. While good separation performance is achieved in matched test conditions using kernel support vector machines (SVMs), separation in unmatched conditions involving new speakers and environments remains a big challenge. A simple yet effective method to cope with the mismatch is to include many different acoustic conditions into the training set. However, large-scale training is almost intractable for kernel machines due to computational complexity. To enable training on relatively large datasets, we propose to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. For feature learning, we employ standard pre-trained deep neural networks (DNNs). The proposed DNN-SVM system is trained on a variety of acoustic conditions within a reasonable amount of time. Experiments on various test mixtures demonstrate good generalization to unseen speakers and background noises.
引用
收藏
页码:1381 / 1390
页数:10
相关论文
共 47 条
  • [1] [Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
  • [2] [Anonymous], 2008, Advances in Neural Information Processing Systems, DOI DOI 10.7751/mitpress/8996.003.0015
  • [3] Learning Deep Architectures for AI
    Bengio, Yoshua
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
  • [4] Boersma P., 2020, Praat: doing phonetics by computer (Version 5.3.82) Computer software
  • [5] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [6] Bordes A, 2005, J MACH LEARN RES, V6, P1579
  • [7] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
  • [8] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [9] Christensen H., 2010, P INTERSPEECH
  • [10] Coates A., 2011, P 14 AISTATS, P215