共 24 条
[1]
Afouras T., 2018, arXiv preprint arXiv:1809.00496
[3]
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]
A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
[J].
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021),
2021,
:6863-6867
[5]
Braga O, 2020, INT CONF ACOUST SPEE, P6994, DOI [10.1109/ICASSP40776.2020.9053974, 10.1109/icassp40776.2020.9053974]
[6]
Caruana R.A., 1993, P 10 INT C INT C MAC, P41, DOI DOI 10.1016/B978-1-55860-307-3.50012-5
[7]
Out of Time: Automated Lip Sync in the Wild
[J].
COMPUTER VISION - ACCV 2016 WORKSHOPS, PT II,
2017, 10117
:251-263
[8]
Chung Joon Son, 2017, 2017 IEEE C COMP VIS
[9]
Chung Soo-Whan, 2019, ICASSP 2019 2019 IEE
[10]
Graves A., 2012, ICML