共 54 条
[1]
Alayrac Jean-Baptiste, 2022, P NEURIPS NEW ORL
[2]
[Anonymous], 2020, P ADV NEUR INF PROC
[3]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:1708-1718
[4]
Bao Hangbo, 2021, PROC INT C LEARN REP
[5]
Bertasius G, 2021, PR MACH LEARN RES, V139
[6]
Revisiting the "Video" in Video-Language Understanding
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:2907-2917
[7]
Chen Jinsong, 2023, ICLR
[8]
UNITER: UNiversal Image-TExt Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:104-120
[9]
Cheng Feng, 2022, ARXIV220401680
[10]
Fang Han, 2021, AAAI C ARTIFICIAL IN