共 71 条
[1]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[4]
Learning the Best Pooling Strategy for Visual Semantic Embedding
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:15784-15793
[5]
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10635-10644
[6]
UNITER: UNiversal Image-TExt Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:104-120
[7]
Cheng Xing, 2021, arXiv
[8]
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:11563-11573
[9]
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805