共 37 条
[1]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[2]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[3]
Bertasius G, 2021, PR MACH LEARN RES, V139
[4]
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10635-10644
[5]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]
Dual Encoding for Zero-Example Video Retrieval
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:9338-9347
[7]
Dosovitskiy A., 2021, 9 INT C LEARN REPR I
[8]
Multi-modal Transformer for Video Retrieval
[J].
COMPUTER VISION - ECCV 2020, PT IV,
2020, 12349
:214-229
[9]
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:4996-5005
[10]
Clover : Towards A Unified Video-Language Alignment and Fusion Model
[J].
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2023,
:14856-14866