共 54 条
[11]
Faghri Fartash, 2017, arXiv
[12]
Fang Han, 2022, IEEE Trans. Multimedia
[13]
Feichtenhofer Christoph, 2022, Advances in Neural Information Processing Systems, P35946
[14]
Multi-modal Transformer for Video Retrieval
[J].
COMPUTER VISION - ECCV 2020, PT IV,
2020, 12349
:214-229
[15]
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180
[16]
Gao Peng, 2021, ARXIV211004544
[17]
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
[J].
COMPUTER VISION - ECCV 2022, PT XXXV,
2022, 13695
:691-708
[18]
Gortietal S.K., 2022, CVPR, P5006
[19]
Masked Autoencoders Are Scalable Vision Learners
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:15979-15988
[20]
Localizing Moments in Video with Natural Language
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:5804-5813