共 30 条
[1]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[2]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[3]
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[4]
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021,
2021,
:3951-3955
[5]
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:11563-11573
[6]
Fang Han, 2021, Arxiv
[7]
Bridging Video-text Retrieval with Multiple Choice Questions
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:16146-16155
[8]
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:4996-5005
[9]
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
[J].
COMPUTER VISION - ECCV 2022, PT XIV,
2022, 13674
:444-461
[10]
Scaling Up Vision-Language Pre-training for Image Captioning
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:17959-17968