共 50 条
- [31] Depth-Aware Sparse Transformer for Video-Language Learning PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4778 - 4787
- [32] Clover : Towards A Unified Video-Language Alignment and Fusion Model 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14856 - 14866
- [33] Learning Trajectory-Word Alignments for Video-Language Tasks 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2504 - 2514
- [34] HiVLP: Hierarchical Interactive Video-Language Pre-Training 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13710 - 13720
- [36] VideoCon: Robust Video-Language Alignment via Contrast Captions 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13927 - 13937
- [37] Object-aware Video-language Pre-training for Retrieval 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3303 - 3312
- [38] STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3715 - 3723
- [39] Learning Unified Video-Language Representations via Joint Modeling and Contrastive Learning for Natural Language Video Localization 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
- [40] Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5026 - 5035