共 32 条
[1]
Afouras T, 2020, INT CONF ACOUST SPEE, P2143, DOI [10.1109/ICASSP40776.2020.9054253, 10.1109/icassp40776.2020.9054253]
[2]
Afouras Triantafyllos, 2018, ARXIV
[3]
Emerging Properties in Self-Supervised Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9630-9640
[4]
Chung JS, 2018, INTERSPEECH, P1086
[5]
Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
[J].
ACM TRANSACTIONS ON GRAPHICS,
2018, 37 (04)
[6]
Grill J-B, 2020, PROC ADV NEURAL INF, V33, P21271
[7]
Haliassos Alexandros, 2023, ICLR
[8]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[9]
Hsu WN, 2022, ADV NEUR IN
[10]
Deep Networks with Stochastic Depth
[J].
COMPUTER VISION - ECCV 2016, PT IV,
2016, 9908
:646-661