共 85 条
[1]
Self-supervised Learning of Audio-Visual Objects from Video
[J].
COMPUTER VISION - ECCV 2020, PT XVIII,
2020, 12363
:208-224
[2]
Look, Listen and Learn
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:609-617
[4]
Aytar Y, 2016, ADV NEUR IN, V29
[5]
End-to-End Referring Video Object Segmentation with Multimodal Transformers
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:4975-4985
[6]
One-Shot Video Object Segmentation
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:5320-5329
[7]
Chen DL, 2021, ADV NEUR IN
[8]
Localizing Visual Sounds the Hard Way
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:16862-16871
[9]
Chen HL, 2020, INT CONF ACOUST SPEE, P721, DOI [10.1109/icassp40776.2020.9053174, 10.1109/ICASSP40776.2020.9053174]