VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

被引:8
|
作者
Qiao, Yanyuan [1 ]
Yu, Zheng [1 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.01416
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of the Vision-and-Language Navigation (VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we design two PETL modules: Historical Interaction Booster (HIB) and Cross-modal Interaction Booster (CIB). Then we combine these two modules with several existing PETL methods as the integrated VLN-PETL. Extensive experimental results on four mainstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed VLN-PETL, where VLN-PETL achieves comparable or even better performance to full fine-tuning and outperforms other PETL methods with promising margins. The source code is available at https://github.com/YanyuanQiao/VLN-PETL
引用
收藏
页码:15397 / 15406
页数:10
相关论文
共 50 条
  • [1] VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
    Sung, Yi-Lin
    Cho, Jaemin
    Bansal, Mohit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5217 - 5227
  • [2] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
    Hong, Yicong
    Wu, Qi
    Qi, Yuankai
    Rodriguez-Opazo, Cristian
    Gould, Stephen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653
  • [3] HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
    Zhang, Zhengkun
    Guo, Wenya
    Meng, Xiaojun
    Wang, Yasheng
    Wang, Yadao
    Jiang, Xin
    Liu, Qun
    Yang, Zhenglu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11442 - 11453
  • [4] VLN-VIDEO: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
    Li, Jialu
    Padmakumar, Aishwarya
    Sukhatme, Gaurav
    Bansal, Mohit
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18517 - 18526
  • [5] Improved VLN-BERT with Reinforcing Endpoint Alignment for Vision-and-Language Navigation
    Jin, Chuan
    Yang, Boyuan
    Liu, Ruonan
    GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024, 2024, 2160 : 119 - 133
  • [6] VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control
    Hu, Zi-Yuan
    Li, Yanyang
    Lyu, Michael R.
    Wang, Liwei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2998 - 3008
  • [7] Curriculum Learning for Vision-and-Language Navigation
    Zhang, Jiwen
    Wei, Zhongyu
    Fan, Jianqing
    Peng, Jiajie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Transferable Representation Learning in Vision-and-Language Navigation
    Huang, Haoshuo
    Jain, Vihan
    Mehta, Harsh
    Ku, Alexander
    Magalhaes, Gabriel
    Baldridge, Jason
    Ie, Eugene
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7403 - 7412
  • [9] Vision-and-Language Navigation via Causal Learning
    Wang, Liuyi
    He, Zongtao
    Dang, Ronghao
    Shen, Mengjiao
    Liu, Chengju
    Chen, Qijun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
  • [10] Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments
    Yue, Lu
    Zhou, Dongliang
    Xie, Liang
    Zhang, Feitian
    Yan, Ye
    Yin, Erwei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (06): : 4918 - 4925