共 241 条
[1]
Abnar M., 2020, ARXIV
[2]
Abnar S., 2020, ARXIV200600555
[3]
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
[J].
COMPUTER VISION - ECCV 2020, PT I,
2020, 12346
:422-440
[4]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[5]
[Anonymous], 2017, INT J COMPUT VISION, V123, P32
[6]
[Anonymous], Robot Learning, DOI [10.48550/arXiv.2110.06922, DOI 10.48550/ARXIV.2110.06922]
[7]
[Anonymous], 2020, P IEEE CVF C COMP VI, DOI DOI 10.1109/ICCWAMTIP51612.2020.9317476
[8]
[Anonymous], 2021, P IEEE CVF C COMP VI, DOI DOI 10.1109/TSMC.2019.2958072
[9]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[10]
Ba J.L., 2016, CORR