共 85 条
[12]
Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:1189-1198
[13]
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
[J].
COMPUTER VISION, ECCV 2022, PT XXXIV,
2022, 13694
:431-448
[14]
Cheng Jingchun, 2017, ARXIV
[15]
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:3884-3892
[16]
Lip Reading Sentences in the Wild
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:3444-3450
[18]
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:5908-5917
[19]
Faktor A., 2014, BRIT MACH VIS C BMVC, V2, P8
[20]
Co-Separating Sounds of Visual Objects
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:3878-3887