CNN-Transformer with Stepped Distillation for Fine-Grained Visual Classification

被引:0
|
作者
Xu, Qin [1 ,2 ]
Liu, Peng [1 ,2 ]
Wang, Jiahui [1 ,2 ]
Huang, Lili [1 ,2 ]
Tang, Jin [1 ,2 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT IX, PRCV 2024 | 2025年 / 15039卷
基金
中国国家自然科学基金;
关键词
Fine-grained Visual Classification; Convolutional Neural Network; Vision Transformer; Knowledge Distillation;
D O I
10.1007/978-981-97-8692-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capturing the fine-grained details and global information are vital for fine-grained visual classification (FGVC). However, most of the existing methods either use the CNN or rely on the Transformer that can not effectively extract the local and long-range features for FGVC. To address this issue, we propose the CNN-Transformer with stepped distillation (SDCT) for FGVC. In this method, we propose the multi-level and multi-scale feature combiner (MLMSFC) for capturing the rich local features. In the MLMSFC, the image-level features of different scales extracted by the backbone CNN are fused with the part-level features of different scales obtained by the parts selector module and the CNN which shares the weights with the backbone. Meanwhile, the Transformer is utilized to extract the global features from the input image. Moreover, to teach the CNNs to learn more global features of images for inference, we propose the stepped distillation (SD) module. Through the stepped-wise distillation, the shallower stage can learn effective features guided by the deeper stage, resulting in the local and global feature representation improvement and generalization capability enhancement. The experiments conducted on the three popular FGVC datasets demonstrate that the proposed SDCT achieves competitive results compared with the state-of-the-art methods.
引用
收藏
页码:364 / 377
页数:14
相关论文
共 50 条
  • [1] Fine-Grained Ship Classification by Combining CNN and Swin Transformer
    Huang, Liang
    Wang, Fengxiang
    Zhang, Yalun
    Xu, Qingxia
    REMOTE SENSING, 2022, 14 (13)
  • [2] A CNN-Transformer Network With Multiscale Context Aggregation for Fine-Grained Cropland Change Detection
    Liu, Mengxi
    Chai, Zhuoqun
    Deng, Haojun
    Liu, Rong
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 4297 - 4306
  • [3] Hierarchical attention vision transformer for fine-grained visual classification
    Hu, Xiaobin
    Zhu, Shining
    Peng, Taile
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [4] TransFGVC: transformer-based fine-grained visual classification
    Shen, Longfeng
    Hou, Bin
    Jian, Yulei
    Tu, Xisong
    Zhang, Yingjie
    Shuai, Lingying
    Ge, Fangzhen
    Chen, Debao
    VISUAL COMPUTER, 2025, 41 (04): : 2439 - 2459
  • [5] Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
    Ji, Ruyi
    Li, Jiaying
    Zhang, Libo
    Liu, Jing
    Wu, Yanjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5009 - 5021
  • [6] Convolutionally Enhanced Feature Fusion Visual Transformer for Fine-Grained Visual Classification
    Huang, Min
    Zhu, Saixing
    Wang, Zehua
    Qu, Shuanghong
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 447 - 452
  • [7] CloudSwinNet: A hybrid CNN-transformer framework for ground-based cloud images fine-grained segmentation
    Shi, Chaojun
    Su, Zibo
    Zhang, Ke
    Xie, Xiongbin
    Zhang, Xiaoyun
    ENERGY, 2024, 309
  • [8] A CNN-TRANSFORMER KNOWLEDGE DISTILLATION FOR REMOTE SENSING SCENE CLASSIFICATION
    Nabi, Mostaan
    Maggiolo, Luca
    Moser, Gabriele
    Serpico, Sebastiano B.
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 663 - 666
  • [9] Fine-Grained Visual Classification via Internal Ensemble Learning Transformer
    Xu, Qin
    Wang, Jiahui
    Jiang, Bo
    Luo, Bin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9015 - 9028
  • [10] Dual-Dependency Attention Transformer for Fine-Grained Visual Classification
    Cui, Shiyan
    Hui, Bin
    SENSORS, 2024, 24 (07)