Add-Vit: CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing

被引:3
|
作者
Chen, Jinhui [1 ]
Wu, Peng [1 ]
Zhang, Xiaoming [2 ]
Xu, Renjie [3 ]
Liang, Jia [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Mech Engn, 928 2nd St, Hangzhou 310018, Zhejiang, Peoples R China
[2] Army Acad Armored Forces, Dept Vehicle Engn, Dujiakan 21st,Fengtai, Beijing 100072, Peoples R China
[3] Army Acad Armored Forces, Performance & Training Ctr, Dujiakan 21st,Fengtai, Beijing 100072, Peoples R China
基金
中国国家自然科学基金;
关键词
Local feature; Vision transformer (ViT); Image classification; Small data paradigm;
D O I
10.1007/s11063-024-11643-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vision transformer(ViT), pre-trained on large datasets, outperforms convolutional neural networks (CNN) in computer vision(CV). However, if not pre-trained, the transformer architecture doesn't work well on small datasets and is surpassed by CNN. Through analysis, we found that:(1) the division and processing of tokens in the ViT discard the marginalized information between token. (2) the isolated multi-head self-attention (MSA) lacks prior knowledge. (3) the local inductive bias capability of stacked transformer block is much inferior to that of CNN. We propose a novel architecture for small data paradigms without pre-training, named Add-Vit, which uses progressive tokenization with feature supplementation in patch embedding. The model's representational ability is enhanced by using a convolutional prediction module shortcut to connect MSA and capture local features as additional representations of the token. Without the need for pre-training on large datasets, our best model achieved 81.25 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} accuracy when trained from scratch on the CIFAR-100.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A Hybrid CNN-Transformer Architecture for Semantic Segmentation of Radar Sounder data
    Ghosh, Raktim
    Bovolo, Francesca
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1320 - 1323
  • [2] CNN-Transformer Hybrid Architecture for Early Fire Detection
    Yang, Chenyue
    Pan, Yixuan
    Cao, Yichao
    Lu, Xiaobo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581
  • [3] CNN-Transformer Hybrid Architecture for Underwater Sonar Image Segmentation
    Lei, Juan
    Wang, Huigang
    Lei, Zelin
    Li, Jiayuan
    Rong, Shaowei
    REMOTE SENSING, 2025, 17 (04)
  • [4] Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture
    Zhao, Qian
    Yang, Hao
    Zhou, Dongming
    Cao, Jinde
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [5] Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture
    Zhao, Qian
    Yang, Hao
    Zhou, Dongming
    Cao, Jinde
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [6] Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture
    Zhao, Qian
    Yang, Hao
    Zhou, Dongming
    Cao, Jinde
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [7] Weak Appearance Aware Pipeline Leak Detection based on CNN-Transformer Hybrid Architecture
    Zhang, Bulin
    Yuan, Haiwen
    Ge, Jie
    Cheng, Li
    Li, Xuan
    Xiao, Changshi
    IEEE Transactions on Instrumentation and Measurement, 2024,
  • [8] Weak Appearance Aware Pipeline Leak Detection Based on CNN-Transformer Hybrid Architecture
    Zhang, Bulin
    Yuan, Haiwen
    Ge, Jie
    Cheng, Li
    Li, Xuan
    Xiao, Changshi
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [9] Hybrid CNN-transformer network for efficient CSI feedback
    Zhao, Ruohan
    Liu, Ziang
    Song, Tianyu
    Jin, Jiyu
    Jin, Guiyue
    Fan, Lei
    PHYSICAL COMMUNICATION, 2024, 66
  • [10] Image harmonization with Simple Hybrid CNN-Transformer Network
    Li, Guanlin
    Zhao, Bin
    Li, Xuelong
    NEURAL NETWORKS, 2024, 180