Feature Space VTS with Phase Term Modeling

被引:0
作者
Korenevsky, Maxim [1 ,2 ]
Romanenko, Aleksei [2 ]
机构
[1] ITMO Univ, St Petersburg, Russia
[2] STC Innovat Ltd, St Petersburg, Russia
来源
SPEECH AND COMPUTER | 2016年 / 9811卷
关键词
Robust speech recognition; Feature compensation; Vector taylor series; Distortion model; Phase-sensitive; Aurora2; SPEECH; ENVIRONMENT;
D O I
10.1007/978-3-319-43958-7_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new variant of Vector Taylor Series based features compensation algorithm is proposed. The phase-sensitive speech distortion model is used and the phase term is modeled as a multivariate gaussian with unknown mean vector and covariance matrix. These parameters are estimated based on Maximum Likelihood principle and EM-algorithm is used for this. EM formulas of parameter update are derived as well MMSE estimate of the clean speech features. The experiments on Aurora2 database show that taking phase term into account and data-driven estimation of its parameters result in relative WER reduction of about 20% compared to phase-insensitive VTS version. The proposed method is also compared to the VTS with constant phase vector and this approximation is shown to be very efficient.
引用
收藏
页码:312 / 320
页数:9
相关论文
共 22 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]  
[Anonymous], THESIS
[3]  
[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5
[4]  
[Anonymous], 1996, THESIS
[5]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[6]   Enhancement of log Mel power spectra of speech using a phase-sensitive model of the-acoustic environment and sequential estimation of the corrupting noise [J].
Deng, L ;
Droppo, J ;
Acero, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (02) :133-143
[7]  
Gales M., 2014, COMPUT SPEECH LANG, V24, P648
[8]  
Graves A, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P273, DOI 10.1109/ASRU.2013.6707742
[9]  
HIRSCH H, 2000, P ISCA ITRWASR2000 A
[10]  
Hu Y., 2007, P INTERSPEECH, P1042