共 63 条
[1]
[Anonymous], 2022, P IEEE CVF C COMP VI, DOI DOI 10.1109/SPIES55999.2022.10082039
[2]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[3]
Bao Hangbo, 2021, INT C LEARN REPR, V2, P5
[4]
Cai Guanyu, 2022, ARXIV220307303
[5]
Chen D.L., 2011, ACL, V1, P190
[6]
Chen Xinlei., 2015, CoRR abs/1504.00325
[7]
Cheng Xing, 2021, ARXIV210904290
[8]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:1999-2007
[10]
Fu Tsu-Jui, 2021, ARXIV211112681