共 91 条
[1]
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:3674-3683
[2]
[Anonymous], 2021, NEURIPS, DOI DOI 10.1016/J.JCBS.2021.01.008
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[4]
YOLACT Real-time Instance Segmentation
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:9156-9165
[5]
Cai Z., 2022, ARXIV220405626
[6]
Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813
[7]
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[8]
Chen Ting, 2022, ARXIV220607669
[9]
Chen X., 2015, Microsoft COCO captions: Data collection and evaluation server
[10]
Chen Yen-Chun, 2020, Eccv