共 43 条
- [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
- [2] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [3] Ba JL., 2016, ARXIV
- [4] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
- [5] Chen K, 2016, Arxiv, DOI [arXiv:1511.05960, DOI 10.48550/ARXIV.1511.05960,ARXIV]
- [6] Chorowski J, 2015, ADV NEUR IN, V28
- [8] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6087 - 6096
- [9] Fukui A., 2016, arXiv
- [10] Gao HY, 2015, ADV NEUR IN, V28