共 63 条
[1]
[Anonymous], 2015, Microsoft COCO captions: Data collection and evaluation server
[2]
[Anonymous], 2022, P IEEE CVF C COMP VI, DOI DOI 10.1109/SPIES55999.2022.10082039
[3]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[4]
Bao H., 2021, INT C LEARN REPR
[5]
Bird S., 2006, COL ACL 2006 21 INT
[6]
Chen D., 2011, ACL, P190
[7]
Cheng Xing, 2021, ARXIV210904290
[8]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:1999-2007
[10]
Fu Tsu-Jui, 2021, ARXIV211112681