共 27 条
[1]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[2]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[3]
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:11563-11573
[4]
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021,
2021,
:3349-3358
[5]
Fang H., 2021, arXiv, DOI 10.48550/arXiv.2106.11097
[6]
Gabeur V., 2020, COMPUTER VISION ECCV
[7]
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:4996-5005
[8]
Jiang Jie, 2022, IEEE Access
[9]
Kay W, 2017, Arxiv, DOI arXiv:1705.06950
[10]
Stacked Cross Attention for Image-Text Matching
[J].
COMPUTER VISION - ECCV 2018, PT IV,
2018, 11208
:212-228