共 34 条
[1]
Alamri H., 2018, P AAAI WORKSH, V2, P1
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Look, Listen and Learn
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:609-617
[4]
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:1999-2007
[6]
Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
[7]
Jiang P, 2020, AAAI CONF ARTIF INTE, V34, P11109
[9]
Li GY, 2023, Arxiv, DOI arXiv:2305.17993
[10]
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:19086-19096