Speech Recognition Using Factorial Hidden Markov Models for Separation in the Feature Space

被引:0
作者
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Inst Signal Proc, FIN-33101 Tampere, Finland
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
speech recognition; speech separation; factorial hidden Markov model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an algorithm for the recognition and separation of speech signals in non-stationary noise, such as another speaker. We present a method to combine hidden Markov models (HMMs) trained for the speech and noise into a factorial HMM to model the mixture signal. Robustness is obtained by separating the speech and noise signals in a feature domain, which discards unnecessary information. We use mel-cepstral coefficients (MFCCs) as features, and estimate the distribution of mixture MFCCs from the distributions of the target speech and noise. A decoding algorithm is proposed for finding the state transition paths and estimating gains for the speech and noise from a mixture signal. Simulations were carried out using speech material where two speakers were mixed at various levels, and even for high noise level (9 dB above the speech level), the method produced relatively good (60% word recognition accuracy) results. Audio demonstrations are available at www.cs.tut.fi/(similar to)tuomasv.
引用
收藏
页码:89 / 92
页数:4
相关论文
共 10 条
[1]  
BARKER JP, 2005, SPEECH COMMUNICATION, V45
[2]   Estimating the distribution of a sum of independent lognormal random variables [J].
Beaulieu, NC ;
AbuDayya, AA ;
McLane, PJ .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1995, 43 (12) :2869-2873
[3]  
Brent R. P., 1973, ALGORITHMS MINIMIZAT
[4]  
COOKE MP, 2005, J ACOUST SOC A UNPUB
[5]  
GALES MJF, 1993, SPEECH COMMUNICATION, V12
[6]  
Ghahramani Zoubin, 1997, MACHINE LEARNING, V29
[7]  
Lagarias JC, 1998, SIAM J OPTIM, V9
[8]  
NADAS A, 1989, IEEE T SPEECH AUDIO, V37
[9]  
ROWEIS ST, 2003, EUR GEN SWITZ
[10]  
VARGA AP, 1990, IEEE INT C AUD SPEEC