共 35 条
[1]
Adiwardana D., Luong M.-T., So D. R., Hall J., Fiedel N., Thoppilan R., Yang Z., Kulshreshtha A., Nemade G., Lu Y., Le Q. V., Towards a human-like open-domain chatbot, (2020)
[2]
Akuzawa K., Iwasawa Y., Matsuo Y., Expressive speech synthesis via modeling expressions with variational autoen-coder, Proc. Interspeech, pp. 3067-3071, (2018)
[3]
Cai X., Dai D., Wu Z., Li X., Li J., Meng H., Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition, Proc. ICASSP, pp. 5734-5738, (2021)
[4]
Church K. W., Hanks P., Word association norms, mutual information, and lexicography, Computational Linguistics, 16, 1, pp. 22-29, (1990)
[5]
Cui C., Ren Y., Liu J., Chen F., Huang R., Lei M., Zhao Z., EMOVIE: A mandarin emotion speech dataset with a simple emotional text-to-speech model, Proc. Interspeech, pp. 2766-2770, (2021)
[6]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. NAACL, pp. 4171-4186, (2019)
[7]
Hayashi T., Watanabe S., Toda T., Takeda K., Toshni-wal S., Livescu K., Pre-trained text embeddings for enhanced text-to-speech synthesis, Proc. Interspeech, pp. 4430-4434, (2019)
[8]
Hojo N., Ijima Y., Mizuno H., DNN-based speech synthesis using speaker codes, IEICE Transactions on Information and Systems, E101.D, 2, pp. 462-472, (2018)
[9]
Inoue K., Hara S., Abe M., Hojo N., Ijima Y., Model architectures to extrapolate emotional expressions in DNN-based text-to-speech, Speech Communication, 126, pp. 35-43, (2021)
[10]
Kanagawa H., Ijima Y., Multi-sample subband wavernn via multivariate Gaussian, Proc. ICASSP, pp. 8427-8431, (2022)