共 29 条
[2]
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
[J].
COMPUTER VISION, ECCV 2022, PT XXXIV,
2022, 13694
:431-448
[3]
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2023,
:18827-18836
[4]
Momentum Contrast for Unsupervised Visual Representation Learning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:9726-9735
[5]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[6]
Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
[7]
Lamba J, 2021, Arxiv, DOI arXiv:2104.04598
[8]
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:1336-1345
[9]
Lin YB, 2021, ADV NEUR IN, V34
[10]
DUAL-MODALITY SEQ2SEQ NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
[J].
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP),
2019,
:2002-2006