共 50 条
- [41] Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts [J]. COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 163 - 180
- [42] All in One: Exploring Unified Video-Language Pre-training [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6598 - 6608
- [44] PiTe: Pixel-Temporal Alignment for Large Video-Language Model [J]. COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 160 - 176
- [46] HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15359 - 15370
- [47] VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4227 - 4239
- [48] SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model [J]. COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 537 - 553
- [49] SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2459 - 2469
- [50] RTQ: Rethinking Video-language Understanding Based on Image-text Model [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 557 - 566