Efficient feature selection for pre-trained vision transformers

被引：0

作者：

Huang, Lan ^{[1
,2
]}

Zeng, Jia ^{[1
]}

Yu, Mengqiang ^{[1
]}

Ding, Weiping ^{[3
]}

Bai, Xingyu ^{[1
]}

Wang, Kangping ^{[1
,2
]}

机构：

[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2025年 / 254卷

关键词：

Feature selection; Vision transformer; Model pruning;

D O I：

10.1016/j.cviu.2025.104326

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Handcrafted layer-wise vision transformers have demonstrated remarkable performance in image classification. However, their high computational cost limits their practical applications. In this paper, we first identify and highlight the data-independent feature redundancy in pre-trained Vision Transformer (ViT) models. Based on this observation, we explore the feasibility of searching for the best substructure within the original pre-trained model. To this end, we propose EffiSelecViT, a novel pruning method aimed at reducing the computational cost of ViTs while preserving their accuracy. EffiSelecViT introduces importance scores for both self-attention heads and Multi-Layer Perceptron (MLP) neurons in pre-trained ViT models. L1 regularization is applied to constrain and learn these scores. In this simple way, components that are crucial for model performance are assigned higher scores, while those with lower scores are identified as less important and subsequently pruned. Experimental results demonstrate that EffiSelecViT can prune DeiT-B to retain only 64% of FLOPs while maintaining accuracy. This efficiency-accuracy trade-off is consistent across various ViT architectures. Furthermore, qualitative analysis reveals enhanced information expression in the pruned models, affirming the effectiveness and practicality of EffiSelecViT. The code is available at https://github.com/ZJ6789/EffiSelecViT.

引用

页数：10

共 50 条

[1] Classifying microfossil radiolarians on fractal pre-trained vision transformers
Mimura, Kazuhide
Itaki, Takuya
Kataoka, Hirokatsu
Miyakawa, Ayumu
SCIENTIFIC REPORTS, 2025, 15 (01):
[2] ViTMatte: Boosting image matting with pre-trained plain vision transformers
Yao, Jingfeng
Wang, Xinggang
Yang, Shusheng
Wang, Baoyuan
INFORMATION FUSION, 2024, 103
[3] Interpretable domain adaptation using unsupervised feature selection on pre-trained source models
Zhang, Luxin
Germain, Pascal
Kessaci, Yacine
Biernacki, Christophe
NEUROCOMPUTING, 2022, 511 : 319 - 336
[4] Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department
Jiang, Shancheng
Chin, Kwai-Sang
Wang, Long
Qu, Gang
Tsui, Kwok L.
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 82 : 216 - 230
[5] Underwater Image Enhancement Using Pre-trained Transformer
Boudiaf, Abderrahmene
Guo, Yuhang
Ghimire, Adarsh
Werghi, Naoufel
De Masi, Giulia
Javed, Sajid
Dias, Jorge
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 480 - 488
[6] Token Selection is a Simple Booster for Vision Transformers
Zhou, Daquan
Hou, Qibin
Yang, Linjie
Jin, Xiaojie
Feng, Jiashi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12738 - 12746
[7] AttnZero: Efficient Attention Discovery for Vision Transformers
Li, Lujun
Wei, Zimian
Dong, Peijie
Luo, Wenhan
Xue, Wei
Liu, Qifeng
Guo, Yike
COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 20 - 37
[8] An optimal deep learning approach for breast cancer detection and classification with pre-trained CNN-based feature learning mechanism
Meena, L. C.
Joe Prathap, P. M.
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024,
[9] Target to Source Coordinate-Wise Adaptation of Pre-trained Models
Zhang, Luxin
Germain, Pascal
Kessaci, Yacine
Biernacki, Christophe
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT I, 2021, 12457 : 378 - 394
[10] Towards Efficient Adversarial Training on Vision Transformers
Wu, Boxi
Gu, Jindong
Li, Zhifeng
Cai, Deng
He, Xiaofei
Liu, Wei
COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 307 - 325

← 1 2 3 4 5 →