共 102 条
- [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
- [2] 3D Scene Graph: A structure for unified semantics, 3D space, and camera [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5663 - 5672
- [3] Bian N, 2024, Arxiv, DOI [arXiv:2303.16421, DOI 10.48550/ARXIV.2303.16421]
- [4] Bollacker K., 2008, P ACM SIGMOD INT C M, P1247, DOI DOI 10.5555/1619797.1619981
- [5] Brown TB, 2020, ADV NEUR IN, V33
- [6] Chowdhery A, 2023, J MACH LEARN RES, V24
- [7] Learning to Act Properly: Predicting and Explaining Affordances from Images [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 975 - 983
- [8] Corona R, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, P54
- [9] Dehghani Mostafa, P MACHINE LEARNING R
- [10] Deitke M, 2022, Arxiv, DOI arXiv:2210.06849