共 5 条
- [1] Learning from Unlabeled 3D Environments for Vision-and-Language Navigation COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 638 - 655
- [2] LLM as Copilot for Coarse-Grained Vision-and-Language Navigation COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 459 - 476
- [3] FashionViL: Fashion-Focused Vision-and-Language Representation Learning COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 634 - 651
- [5] A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1999 - 2004