共 58 条
[1]
Audio Visual Scene-Aware Dialog
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:7550-7559
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:3674-3683
[4]
[Anonymous], 2018, ARXIV180600525
[5]
[Anonymous], CVPR
[6]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[7]
Ba J.L., 2016, stat, VVolume 29, P3617, DOI 10.48550/arXiv.1607.06450
[8]
Banerjee S., 2005, P ACL WORKSH INTR EX, P65
[9]
Bengio Y., 2013, CoRR abs/1308.3432
[10]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733