CNN-Transformer with Stepped Distillation for Fine-Grained Visual Classification

被引：0

作者：

Xu, Qin ^{[1
,2
]}

Liu, Peng ^{[1
,2
]}

Wang, Jiahui ^{[1
,2
]}

Huang, Lili ^{[1
,2
]}

Tang, Jin ^{[1
,2
]}

机构：

[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China

[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT IX, PRCV 2024 | 2025年 / 15039卷

基金：

中国国家自然科学基金;

关键词：

Fine-grained Visual Classification; Convolutional Neural Network; Vision Transformer; Knowledge Distillation;

D O I：

10.1007/978-981-97-8692-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Capturing the fine-grained details and global information are vital for fine-grained visual classification (FGVC). However, most of the existing methods either use the CNN or rely on the Transformer that can not effectively extract the local and long-range features for FGVC. To address this issue, we propose the CNN-Transformer with stepped distillation (SDCT) for FGVC. In this method, we propose the multi-level and multi-scale feature combiner (MLMSFC) for capturing the rich local features. In the MLMSFC, the image-level features of different scales extracted by the backbone CNN are fused with the part-level features of different scales obtained by the parts selector module and the CNN which shares the weights with the backbone. Meanwhile, the Transformer is utilized to extract the global features from the input image. Moreover, to teach the CNNs to learn more global features of images for inference, we propose the stepped distillation (SD) module. Through the stepped-wise distillation, the shallower stage can learn effective features guided by the deeper stage, resulting in the local and global feature representation improvement and generalization capability enhancement. The experiments conducted on the three popular FGVC datasets demonstrate that the proposed SDCT achieves competitive results compared with the state-of-the-art methods.

引用

页码：364 / 377

页数：14

共 50 条

[31] Adversarially attack feature similarity for fine-grained visual classification [J].

Wang, Yupeng ;

Xu, Can ;

Wang, Yongli ;

Wang, Xiaoli ;

Ding, Weiping .

APPLIED SOFT COMPUTING, 2024, 163

[32] A Data Augmentation Based ViT for Fine-Grained Visual Classification [J].

Yuan, Shuozhi ;

Guo, Wenming ;

Han, Fang .

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT II, 2023, 14255 :1-12

[33] Cross-hierarchical bidirectional consistency learning for fine-grained visual classification [J].

Gao, Pengxiang ;

Liang, Yihao ;

Song, Yanzhi ;

Yang, Zhouwang .

INFORMATION SCIENCES, 2025, 721

[34] AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification [J].

Ding, Yifeng ;

Ma, Zhanyu ;

Wen, Shaoguo ;

Xie, Jiyang ;

Chang, Dongliang ;

Si, Zhongwei ;

Wu, Ming ;

Ling, Haibin .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2826-2836

[35] A dual-level part distillation network for fine-grained visual categorization [J].

Zhang, Xiangfen ;

Hong, Shitao ;

Luo, Haixia ;

Jiang, Zhen ;

Yuan, Feiniu .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 138

[36] Multilayer feature descriptors fusion CNN models for fine-grained visual recognition [J].

Hou, Yong ;

Luo, Hangzai ;

Zhao, Wanqing ;

Zhang, Xiang ;

Wang, Jun ;

Peng, Jinye .

COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)

[37] Vision Mamba Distillation for Low-Resolution Fine-Grained Image Classification [J].

Chen, Yao ;

Wang, Jiabao ;

Wang, Peichao ;

Zhang, Rui ;

Li, Yang .

IEEE SIGNAL PROCESSING LETTERS, 2025, 32 :1965-1969

[38] Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification [J].

Okamoto, Naoki ;

Hirakawa, Tsubasa ;

Yamashita, Takayoshi ;

Fujiyoshi, Hironobu .

COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 :502-518

[39] A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation [J].

Wang, Kang ;

Yang, Feng ;

Chen, Zhibo ;

Chen, Yixin ;

Zhang, Ying .

ANIMALS, 2023, 13 (02)

[40] A Fine-grained Image Classification Algorithm Using Knowledge Distillation and Attention Mechanism [J].

Feng, Hao ;

Liu, Ying ;

Zhang, Weidong .

2023 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION, AIPR 2023, 2023, :188-193

← 1 2 3 4 5 →