CNN-Transformer with Stepped Distillation for Fine-Grained Visual Classification

被引：0

作者：

Xu, Qin ^{[1
,2
]}

Liu, Peng ^{[1
,2
]}

Wang, Jiahui ^{[1
,2
]}

Huang, Lili ^{[1
,2
]}

Tang, Jin ^{[1
,2
]}

机构：

[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China

[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT IX, PRCV 2024 | 2025年 / 15039卷

基金：

中国国家自然科学基金;

关键词：

Fine-grained Visual Classification; Convolutional Neural Network; Vision Transformer; Knowledge Distillation;

D O I：

10.1007/978-981-97-8692-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Capturing the fine-grained details and global information are vital for fine-grained visual classification (FGVC). However, most of the existing methods either use the CNN or rely on the Transformer that can not effectively extract the local and long-range features for FGVC. To address this issue, we propose the CNN-Transformer with stepped distillation (SDCT) for FGVC. In this method, we propose the multi-level and multi-scale feature combiner (MLMSFC) for capturing the rich local features. In the MLMSFC, the image-level features of different scales extracted by the backbone CNN are fused with the part-level features of different scales obtained by the parts selector module and the CNN which shares the weights with the backbone. Meanwhile, the Transformer is utilized to extract the global features from the input image. Moreover, to teach the CNNs to learn more global features of images for inference, we propose the stepped distillation (SD) module. Through the stepped-wise distillation, the shallower stage can learn effective features guided by the deeper stage, resulting in the local and global feature representation improvement and generalization capability enhancement. The experiments conducted on the three popular FGVC datasets demonstrate that the proposed SDCT achieves competitive results compared with the state-of-the-art methods.

引用

页码：364 / 377

页数：14

共 50 条

[1] Fine-Grained Ship Classification by Combining CNN and Swin Transformer
Huang, Liang
Wang, Fengxiang
Zhang, Yalun
Xu, Qingxia
REMOTE SENSING, 2022, 14 (13)
[2] A CNN-Transformer Network With Multiscale Context Aggregation for Fine-Grained Cropland Change Detection
Liu, Mengxi
Chai, Zhuoqun
Deng, Haojun
Liu, Rong
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 4297 - 4306
[3] Hierarchical attention vision transformer for fine-grained visual classification
Hu, Xiaobin
Zhu, Shining
Peng, Taile
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
[4] TransFGVC: transformer-based fine-grained visual classification
Shen, Longfeng
Hou, Bin
Jian, Yulei
Tu, Xisong
Zhang, Yingjie
Shuai, Lingying
Ge, Fangzhen
Chen, Debao
VISUAL COMPUTER, 2025, 41 (04): : 2439 - 2459
[5] Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
Ji, Ruyi
Li, Jiaying
Zhang, Libo
Liu, Jing
Wu, Yanjun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5009 - 5021
[6] Convolutionally Enhanced Feature Fusion Visual Transformer for Fine-Grained Visual Classification
Huang, Min
Zhu, Saixing
Wang, Zehua
Qu, Shuanghong
2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 447 - 452
[7] CloudSwinNet: A hybrid CNN-transformer framework for ground-based cloud images fine-grained segmentation
Shi, Chaojun
Su, Zibo
Zhang, Ke
Xie, Xiongbin
Zhang, Xiaoyun
ENERGY, 2024, 309
[8] A CNN-TRANSFORMER KNOWLEDGE DISTILLATION FOR REMOTE SENSING SCENE CLASSIFICATION
Nabi, Mostaan
Maggiolo, Luca
Moser, Gabriele
Serpico, Sebastiano B.
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 663 - 666
[9] Fine-Grained Visual Classification via Internal Ensemble Learning Transformer
Xu, Qin
Wang, Jiahui
Jiang, Bo
Luo, Bin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9015 - 9028
[10] Dual-Dependency Attention Transformer for Fine-Grained Visual Classification
Cui, Shiyan
Hui, Bin
SENSORS, 2024, 24 (07)

← 1 2 3 4 5 →