共 72 条
- [1] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [2] Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
- [3] Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
- [4] Cao Meng, 2021, ARXIV210805607
- [5] Cao Meng, 2021, P 2021 C EMP METH NA
- [6] TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12530 - 12539
- [7] Human-like Controllable Image Captioning with Verb-specific Semantic Roles [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16841 - 16851
- [8] Chen L, 2021, AAAI CONF ARTIF INTE, V35, P1036
- [9] Counterfactual Critic Multi-Agent Training for Scene Graph Generation [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4612 - 4622
- [10] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306