共 52 条
[1]
[Anonymous], 2016, PROC C EMPIRICAL MET
[2]
[Anonymous], 2014, SSST EMNLP
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[4]
Video2Text: Learning to Annotate Video Content
[J].
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009),
2009,
:144-151
[5]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[6]
Braude Tom, 2021, ARXIV PREPRINT ARXIV
[7]
Chen SX, 2019, AAAI CONF ARTIF INTE, P8199
[8]
Dual Encoding for Zero-Example Video Retrieval
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:9338-9347
[9]
Duan X, 2018, ADV NEUR IN, V31
[10]
Multi-modal Transformer for Video Retrieval
[J].
COMPUTER VISION - ECCV 2020, PT IV,
2020, 12349
:214-229