共 48 条
[1]
Wang Y, Skerry-Ryan R, Stanton D, Et al., Tacotron: Towards end-to-end speech synthesis, Proceedings of the 18th Annual Conference of the International Speech Communication Association, pp. 4006-4010, (2017)
[2]
Gibiansky A, Arik S, Diamos G, Et al., Deep voice 2: Multi-speaker neural text-to-speech, Advances in Neural Information Processing Systems, pp. 2962-2970, (2017)
[3]
Taigman Y, Wolf L, Polyak A, Et al., VoiceLoop: Voice fitting and synthesis via a phonological loop, (2017)
[4]
Oord A V D, Dieleman S, Zen H, Et al., WaveNet: A generative model for raw audio, Proceedings of the 9th ISCA Speech Synthesis Workshop, (2016)
[5]
Chung Y A, Wang Y, Hsu W N, Et al., Semi-supervised training for improving data efficiency in end-to-end speech synthesis, Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP 2019), pp. 6940-6944, (2019)
[6]
Fong J, Gallegos P O, Hodari Z, Et al., Investigating the robustness of sequence-to-sequence text-to-speech models to imperfectly-transcribed training data, Proceedings of the 20th Annual Conference of the International Speech Communication Association, pp. 1546-1550, (2019)
[7]
Jia Y, Zhang Y, Weiss R, Et al., Transfer learning from speaker verification to multispeaker text-to-speech synthesis, Advances in Neural Information Processing Systems, pp. 4480-4490, (2018)
[8]
Chen Y, Assael Y, Shillingford B, Et al., Sample efficient adaptive text-to-speech, (2018)
[9]
Kalchbrenner N, Elsen E, Simonyan K, Et al., Efficient neural audio synthesis, Proceedings of the 35th International Conference on Machine Learning, pp. 2415-2424, (2018)
[10]
Nachmani E, Polyak A, Taigman Y, Et al., Fitting new speakers based on a short untranscribed sample, Proceedings of the 35th International Conference on Machine Learning, pp. 3680-3688, (2018)