共 85 条
- [11] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1189 - 1198
- [12] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 431 - 448
- [13] Cheng JC, 2017, Arxiv, DOI arXiv:1709.04609
- [14] Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3884 - 3892
- [15] Lip Reading Sentences in the Wild [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3444 - 3450
- [17] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5908 - 5917
- [18] Faktor Alon, 2014, BRIT MACH VIS C BMVC
- [19] Co-Separating Sounds of Visual Objects [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3878 - 3887
- [20] Learning to Separate Object Sounds by Watching Unlabeled Video [J]. COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 36 - 54