共 33 条
- [1] ONE TTS ALIGNMENT TO RULE THEM ALL [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6092 - 6096
- [2] Baevski A., 2020, ADV NEURAL INFORM PR, V33, P12449
- [3] Busso C., 2008, IEMOCAP: Interactive emotional dyadic motion capture database
- [4] Christophe V., 2017, CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit
- [5] CONVERSATIONAL END-TO-END TTS FOR VOICE AGENTS [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 403 - 409
- [6] Huang RJ, 2022, Arxiv, DOI arXiv:2205.07211
- [7] Ito K., 2017, The ljspeech dataset
- [8] Keon L., 2023, PRC INT C AC, P1
- [9] Kim Jaehyeon, 2020, ADV NEURAL INFORM PR, V33, P8067
- [10] Kingma D. P., 2015, ICLR