共 31 条
[1]
Afouras T, 2018, Arxiv, DOI arXiv:1809.00496
[3]
Choi H. -S., 2022, 11 INT C LEARNING
[4]
Choi H.-S., 2019, INT C LEARNING REPR
[5]
Choi HS, 2021, ADV NEUR IN, V34
[6]
Dinh L, 2017, Arxiv, DOI [arXiv:1605.08803, 10.48550/arXiv.1605.08803 1605.08803]
[7]
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8]
Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image
[J].
INTERSPEECH 2020,
2020,
:1321-1325
[9]
Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder
[J].
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA),
2016,