VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

被引：8

作者：

Qiao, Yanyuan ^{[1
]}

Yu, Zheng ^{[1
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.01416

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of the Vision-and-Language Navigation (VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we design two PETL modules: Historical Interaction Booster (HIB) and Cross-modal Interaction Booster (CIB). Then we combine these two modules with several existing PETL methods as the integrated VLN-PETL. Extensive experimental results on four mainstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed VLN-PETL, where VLN-PETL achieves comparable or even better performance to full fine-tuning and outperforms other PETL methods with promising margins. The source code is available at https://github.com/YanyuanQiao/VLN-PETL

引用

页码：15397 / 15406

页数：10

共 50 条

[1] VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Sung, Yi-Lin
Cho, Jaemin
Bansal, Mohit
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5217 - 5227
[2] VLN(sic)BERT: A Recurrent Vision-and-Language BERT for Navigation
Hong, Yicong
Wu, Qi
Qi, Yuankai
Rodriguez-Opazo, Cristian
Gould, Stephen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1643 - 1653
[3] HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
Zhang, Zhengkun
Guo, Wenya
Meng, Xiaojun
Wang, Yasheng
Wang, Yadao
Jiang, Xin
Liu, Qun
Yang, Zhenglu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11442 - 11453
[4] VLN-VIDEO: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
Li, Jialu
Padmakumar, Aishwarya
Sukhatme, Gaurav
Bansal, Mohit
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18517 - 18526
[5] Improved VLN-BERT with Reinforcing Endpoint Alignment for Vision-and-Language Navigation
Jin, Chuan
Yang, Boyuan
Liu, Ruonan
GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024, 2024, 2160 : 119 - 133
[6] VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control
Hu, Zi-Yuan
Li, Yanyang
Lyu, Michael R.
Wang, Liwei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2998 - 3008
[7] Curriculum Learning for Vision-and-Language Navigation
Zhang, Jiwen
Wei, Zhongyu
Fan, Jianqing
Peng, Jiajie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[8] Transferable Representation Learning in Vision-and-Language Navigation
Huang, Haoshuo
Jain, Vihan
Mehta, Harsh
Ku, Alexander
Magalhaes, Gabriel
Baldridge, Jason
Ie, Eugene
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7403 - 7412
[9] Vision-and-Language Navigation via Causal Learning
Wang, Liuyi
He, Zongtao
Dang, Ronghao
Shen, Mengjiao
Liu, Chengju
Chen, Qijun
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
[10] Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments
Yue, Lu
Zhou, Dongliang
Xie, Liang
Zhang, Feitian
Yan, Ye
Yin, Erwei
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (06): : 4918 - 4925

← 1 2 3 4 5 →