SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

被引:4
作者
Zhao, Henry Hengyuan [1 ]
Wang, Pichao [2 ]
Zhao, Yuyang [1 ]
Luo, Hao [3 ]
Wang, Fan [2 ]
Shou, Mike Zheng [1 ]
机构
[1] Natl Univ Singapore, Show Lab, Singapore, Singapore
[2] Alibaba Grp, Sunnyvale, CA USA
[3] Alibaba Grp, Hangzhou, Peoples R China
基金
新加坡国家研究基金会;
关键词
Vision transformer; Transfer learning; Efficient fine-tuning; Representation learning; Few-shot learning; Domain generalization;
D O I
10.1007/s11263-023-01918-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained vision transformers have strong representations benefit to various downstream tasks. Recently many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% extra parameters could surpass full fine-tuning in low-data resource scenarios. However, these methods overlook the task-specific information when fine-tuning diverse downstream tasks. In this paper, we propose a simple yet effective method called "Salient Channel Tuning" (SCT) to leverage the task-specific information by forwarding the model with the task images to select partial channels in a feature map that enables us to tune only 1/8 channels leading to significantly lower parameter costs. Experiments outperform full fine-tuning on 18 out of 19 tasks in the VTAB-1K benchmark by adding only 0.11M parameters of the ViT-B, which is 780x fewer than its full fine-tuning counterpart. Furthermore, experiments on domain generalization and few-shot learning surpass other PEFT methods with lower parameter costs, demonstrating our proposed tuning technique's strong capability and effectiveness in the low-data regime. The code will be available at https://github.com/zhaohengyuan1/SCT.git
引用
收藏
页码:731 / 749
页数:19
相关论文
共 85 条
[1]  
Bar A, 2022, Arxiv, DOI arXiv:2209.00647
[2]  
Beattie C, 2016, Arxiv, DOI arXiv:1612.03801
[3]  
Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[4]   Genetic correlations and causal inferences in ischemic stroke [J].
Cai, Huan ;
Cai, Biyang ;
Liu, Zhonghua ;
Wu, Wenjun ;
Chen, Dihong ;
Fang, Liang ;
Chen, Liyi ;
Sun, Wen ;
Liang, Jialin ;
Zhang, Hao .
JOURNAL OF NEUROLOGY, 2020, 267 (07) :1980-1990
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[7]  
Chen H, 2024, Arxiv, DOI [arXiv:2208.07463, DOI 10.48550/ARXIV.2208.07463]
[8]  
Chen SF, 2022, Arxiv, DOI arXiv:2205.13535
[9]   An Empirical Study of Training Self-Supervised Vision Transformers [J].
Chen, Xinlei ;
Xie, Saining ;
He, Kaiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9620-9629
[10]   Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].
Cheng, Gong ;
Han, Junwei ;
Lu, Xiaoqiang .
PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883