共 91 条
[1]
Dong S.-H., Wang H., Human-Computer Interaction, (2003)
[2]
Dahland G.E., Yu D., Deng L., Acero A., Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech & Language Processing, 20, 1, pp. 30-42, (2012)
[3]
Federico M., Bertoldi N., Cettolo M., Irstlm: An open source toolkit for handling large scale language models, Proceedings of the Annual Conference of the International Speech Communication Association (InterSpeech), pp. 1618-1621, (2008)
[4]
Mohri M., Pereira F., Riley M., Weighted finite-state transducers in speech recognition, Computer Speech & Language, 16, 1, pp. 69-88, (2002)
[5]
Senior A., Lei X., Fine context, low-rank, softplus deep neural networks for mobile speech recognition, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2014)
[6]
Zen H.-G., Tokuda K., Black A.W., Statistical parametric speech synthesis, Speech Communication, 51, 11, pp. 1039-1064, (2009)
[7]
Wu Y.J., Wang R.H., Minimum generation error training for hmm-based speech synthesis, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2006)
[8]
Yu K., Young S., Continuous F0 modelling for HMM based statistical speech synthesis, IEEE Transactions on Audio, Speech and Language Processing, 19, 5, pp. 1071-1079, (2011)
[9]
Zen H., Senior A., Schuster M., Statistical parametric speech synthesis using deep neural networks, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (2013)
[10]
Ernst M.O., Banks M.S., Humans integrate visual and haptic information in a statistically optimal fashion, Nature, 415, 6870, pp. 429-433, (2002)