SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

被引：4

作者：

Zhao, Henry Hengyuan ^{[1
]}

Wang, Pichao ^{[2
]}

Zhao, Yuyang ^{[1
]}

Luo, Hao ^{[3
]}

Wang, Fan ^{[2
]}

Shou, Mike Zheng ^{[1
]}

机构：

[1] Natl Univ Singapore, Show Lab, Singapore, Singapore

[2] Alibaba Grp, Sunnyvale, CA USA

[3] Alibaba Grp, Hangzhou, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2024年 / 132卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

Vision transformer; Transfer learning; Efficient fine-tuning; Representation learning; Few-shot learning; Domain generalization;

D O I：

10.1007/s11263-023-01918-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained vision transformers have strong representations benefit to various downstream tasks. Recently many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% extra parameters could surpass full fine-tuning in low-data resource scenarios. However, these methods overlook the task-specific information when fine-tuning diverse downstream tasks. In this paper, we propose a simple yet effective method called "Salient Channel Tuning" (SCT) to leverage the task-specific information by forwarding the model with the task images to select partial channels in a feature map that enables us to tune only 1/8 channels leading to significantly lower parameter costs. Experiments outperform full fine-tuning on 18 out of 19 tasks in the VTAB-1K benchmark by adding only 0.11M parameters of the ViT-B, which is 780x fewer than its full fine-tuning counterpart. Furthermore, experiments on domain generalization and few-shot learning surpass other PEFT methods with lower parameter costs, demonstrating our proposed tuning technique's strong capability and effectiveness in the low-data regime. The code will be available at https://github.com/zhaohengyuan1/SCT.git

引用

页码：731 / 749

页数：19

共 85 条

[1]

Bar A, 2022, Arxiv, DOI arXiv:2209.00647

[2]

Beattie C, 2016, Arxiv, DOI arXiv:1612.03801

[3]

Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29

[4] Genetic correlations and causal inferences in ischemic stroke [J].

Cai, Huan ;

Cai, Biyang ;

Liu, Zhonghua ;

Wu, Wenjun ;

Chen, Dihong ;

Fang, Liang ;

Chen, Liyi ;

Sun, Wen ;

Liang, Jialin ;

Zhang, Hao .

JOURNAL OF NEUROLOGY, 2020, 267 (07) :1980-1990

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[7]

Chen H, 2024, Arxiv, DOI [arXiv:2208.07463, DOI 10.48550/ARXIV.2208.07463]

[8]

Chen SF, 2022, Arxiv, DOI arXiv:2205.13535

[9] An Empirical Study of Training Self-Supervised Vision Transformers [J].

Chen, Xinlei ;

Xie, Saining ;

He, Kaiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9620-9629

[10] Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].

Cheng, Gong ;

Han, Junwei ;

Lu, Xiaoqiang .

PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883

← 1 2 3 4 5 6 7 8 9 →