DEEP MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS

被引:0
作者
Zen, Heiga [1 ]
Senior, Andrew [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Statistical parametric speech synthesis; hidden Markov models; deep neural networks; mixture density networks; SYNTHESIS SYSTEM; HMM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturally-sounding synthesized speech. However, there are limitations in the current implementation of DNN-based acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this paper investigates the use of a mixture density output layer. It can estimate full probability density functions over real-valued output features conditioned on the corresponding input features. Experimental results in objective and subjective evaluations show that the use of the mixture density output layer improves the prediction accuracy of acoustic features and the naturalness of the synthesized speech.
引用
收藏
页数:5
相关论文
共 35 条
[1]  
[Anonymous], 1999, P EUROSPEECH
[2]  
[Anonymous], 1994, MIXTURE DENSITY NETW
[3]  
[Anonymous], 2013, P 8 ISCA SPEECH SYNT
[4]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[5]  
Fernandez R, 2013, INT CONF ACOUST SPEE, P6885, DOI 10.1109/ICASSP.2013.6638996
[6]  
Fukada T., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P137, DOI 10.1109/ICASSP.1992.225953
[7]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[8]  
Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
[9]  
Kang SY, 2013, INT CONF ACOUST SPEE, P8012, DOI 10.1109/ICASSP.2013.6639225
[10]  
Koriyama T., 2013, IEEE J SELECTED TOPI