共 21 条
[1]
Atienza Rowel, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10094639
[2]
Du Chenpeng, 2022, ARXIV
[3]
Taming Transformers for High-Resolution Image Synthesis
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:12868-12878
[4]
SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM
[J].
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING,
1984, 32 (02)
:236-243
[5]
Vector Quantized Diffusion Model for Text-to-Image Synthesis
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2022,
:10686-10696
[6]
Hendrycks Dan, 2016, P ICLR
[7]
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech
[J].
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022,
2022,
:2595-2605
[8]
Huang Rongjie, 2022, arXiv
[9]
Iashin Vladimir, 2021, ARXIV
[10]
Ito Keith, The LJ speech dataset