共 101 条
[12]
How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:1021-1030
[14]
The devil is in the details: an evaluation of recent feature encoding methods
[J].
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011,
2011,
[15]
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
[J].
INTERSPEECH 2021,
2021,
:3670-3674
[16]
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
[J].
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV,
2023,
:7778-7787
[17]
Chung JS, 2018, INTERSPEECH, P1086
[19]
Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
[20]
Clifton A., 2020, P 28 INT C COMP LING, P5903