Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition

被引:0
作者
Demir, Cemil [1 ,3 ]
Cemgil, A. Taylan [2 ]
Saraclar, Murat [3 ]
机构
[1] TUBITAK BILGEM, Kocaeli, Turkey
[2] Bogazici Univ, Dept Comp Engn, Istanbul, Turkey
[3] Bogazici Univ, Dept Elect & Elect Engn, Istanbul, Turkey
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
speech-music separation; semi-supervised; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we propose a semi-supervised speech-music separation method which uses the speech, music and speech-music segments in a given segmented audio signal to separate speech and music signals from each other in the mixed speech-music segments. In this strategy, we assume, the background music of the mixed signal is partially composed of the repetition of the music segment in the audio. Therefore, we used a mixture model to represent the music signal. The speech signal is modeled using Non-negative Matrix Factorization (NMF) model. The prior model of the template matrix of the NMF model is estimated using the speech segment and updated using the mixed segment of the audio. The separation performance of the proposed method is evaluated in automatic speech recognition task.
引用
收藏
页码:688 / +
页数:2
相关论文
共 10 条
  • [1] Arisoy E, 2007, P INT
  • [2] Evaluation of several strategies for single sensor speech/music separation
    Blouet, Raphael
    Rapaport, Guy
    Cohen, Israel
    Fevotte, Cedric
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 37 - +
  • [3] Cemgil A., 2009, COMPUTATIONAL INTELL, V2009
  • [4] Demir C., 2010, P INT
  • [5] Raj B., 2010, P INT
  • [6] Raj B., 1997, P ICASSP
  • [7] Schmidt M.N., 2006, P ICSLP
  • [8] Smaragdis P., 2009, P NIPS
  • [9] Monaural sound source separation by nonnegative matrix factorization with tempora continuity and sparseness criteria
    Virtanen, Tuomas
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1066 - 1074
  • [10] Weiss R. J., 2008, COMPUTER SPEECH LANG