共 48 条
- [2] Castro S, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4352
- [3] Chen HL, 2020, INT CONF ACOUST SPEE, P721, DOI [10.1109/ICASSP40776.2020.9053174, 10.1109/icassp40776.2020.9053174]
- [4] Choi S, 2020, ARXIV200503356
- [5] Colas Anthony, 2019, ARXIV191201046
- [6] Duan X, 2018, ADV NEUR IN, V31
- [7] Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1999 - 2007
- [8] Motion-Appearance Co-Memory Networks for Video Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6576 - 6585
- [9] Garcia N, 2020, AAAI CONF ARTIF INTE, V34, P10826
- [10] AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11282 - 11292