共 64 条
- [1] Achiam J., 2023, Gpt-4 technical report, DOI 10.48550/arXiv.2303.08774
- [2] Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1708 - 1718
- [3] Bao P., 2022, arXiv
- [4] Collins R.T., 2000, VSAM Final Rep., P1
- [5] Moment Detection in Long Tutorial Videos [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2594 - 2604
- [6] Duan X, 2018, ADV NEUR IN, V31
- [7] TALL: Temporal Activity Localization via Language Query [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5277 - 5285
- [9] GeminiTeam, 2024, arXiv
- [10] From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10867 - 10877