共 66 条
[3]
Self-supervised Learning of Audio-Visual Objects from Video
[J].
COMPUTER VISION - ECCV 2020, PT XVIII,
2020, 12363
:208-224
[4]
My lips are concealed: Audio-visual speech enhancement through obstructions
[J].
INTERSPEECH 2019,
2019,
:4295-4299
[5]
Afouras T, 2018, INTERSPEECH, P3244
[6]
End-to-End Active Speaker Detection
[J].
COMPUTER VISION, ECCV 2022, PT XXXVII,
2022, 13697
:126-143
[7]
Active Speakers in Context
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:12462-12471
[8]
Baevski A, 2022, PR MACH LEARN RES
[10]
Bronkhorst AW, 2000, ACUSTICA, V86, P117