共 54 条
[1]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[2]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[3]
Bao Hangbo, 2021, PROC INT C LEARN REP
[4]
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[5]
Chen D., 2011, ACL, P190
[6]
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10635-10644
[7]
Cheng Xing, 2021, ARXIV210904290
[8]
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:11563-11573
[9]
Dosovitskiy A., 2020, ICLR 2021
[10]
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021,
2021,
:3349-3358