Efficient feature selection for pre-trained vision transformers

被引：2

作者：

Huang, Lan ^{[1
,2
]}

Zeng, Jia ^{[1
]}

Yu, Mengqiang ^{[1
]}

Ding, Weiping ^{[3
]}

Bai, Xingyu ^{[1
]}

Wang, Kangping ^{[1
,2
]}

机构：

[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2025年 / 254卷

关键词：

Feature selection; Vision transformer; Model pruning;

D O I：

10.1016/j.cviu.2025.104326

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Handcrafted layer-wise vision transformers have demonstrated remarkable performance in image classification. However, their high computational cost limits their practical applications. In this paper, we first identify and highlight the data-independent feature redundancy in pre-trained Vision Transformer (ViT) models. Based on this observation, we explore the feasibility of searching for the best substructure within the original pre-trained model. To this end, we propose EffiSelecViT, a novel pruning method aimed at reducing the computational cost of ViTs while preserving their accuracy. EffiSelecViT introduces importance scores for both self-attention heads and Multi-Layer Perceptron (MLP) neurons in pre-trained ViT models. L1 regularization is applied to constrain and learn these scores. In this simple way, components that are crucial for model performance are assigned higher scores, while those with lower scores are identified as less important and subsequently pruned. Experimental results demonstrate that EffiSelecViT can prune DeiT-B to retain only 64% of FLOPs while maintaining accuracy. This efficiency-accuracy trade-off is consistent across various ViT architectures. Furthermore, qualitative analysis reveals enhanced information expression in the pruned models, affirming the effectiveness and practicality of EffiSelecViT. The code is available at https://github.com/ZJ6789/EffiSelecViT.

引用

页数：10

共 50 条

[21] Multi-Class Skin Cancer Classification Using Vision Transformer Networks and Convolutional Neural Network-Based Pre-Trained Models [J].

Arshed, Muhammad Asad ;

Mumtaz, Shahzad ;

Ibrahim, Muhammad ;

Ahmed, Saeed ;

Tahir, Muhammad ;

Shafi, Muhammad .

INFORMATION, 2023, 14 (07)

[22] Genetic algorithm-based efficient feature selection for classification of pre-miRNAs [J].

Xuan, P. ;

Guo, M. Z. ;

Wang, J. ;

Wang, C. Y. ;

Liu, X. Y. ;

Liu, Y. .

GENETICS AND MOLECULAR RESEARCH, 2011, 10 (02) :588-603

[23] Refining the features transferred from pre-trained inception architecture for aerial scene classification [J].

Devi, Nilakshi ;

Borah, Bhogeswar .

INTERNATIONAL JOURNAL OF REMOTE SENSING, 2019, 40 (24) :9260-9278

[24] ReViT: Enhancing vision transformers feature diversity with attention residual connections [J].

Diko, Anxhelo ;

Avola, Danilo ;

Cascio, Marco ;

Cinque, Luigi .

PATTERN RECOGNITION, 2024, 156

[25] A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking [J].

Papa, Lorenzo ;

Russo, Paolo ;

Amerini, Irene ;

Zhou, Luping .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) :7682-7700

[26] Distilling efficient Vision Transformers from CNNs for semantic segmentation [J].

Zheng, Xu ;

Luo, Yunhao ;

Zhou, Pengyuan ;

Wang, Lin .

PATTERN RECOGNITION, 2025, 158

[27] The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers [J].

Son, Seungwoo ;

Ryu, Jegwang ;

Le, Namhoon ;

Lee, Jaeho .

COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 :379-396

[28] Feature Selection for Efficient Gender Classification [J].

Nazir, M. ;

Ishtiaq, Muhammad ;

Batool, Anab ;

Jaffar, M. Arfan ;

Mirza, Anwar M. .

RECENT ADVANCES IN NEURAL NETWORKS, FUZZY SYSTEMS & EVOLUTIONARY COMPUTING, 2010, :70-75

[29] EFFICIENT FEATURE SELECTION FOR POLYP DETECTION [J].

Seghouane, Abd-Krim ;

Ong, Ju Lynn .

2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, :2285-2288

[30] Unstructured Pruning and Low Rank Factorisation of Self-Supervised Pre-Trained Speech Models [J].

Wang, Haoyu ;

Zhang, Wei-Qiang .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (06) :1046-1058

← 1 2 3 4 5 →