Vehicle Classification Algorithm Based on Improved Vision Transformer

被引:2
作者
Dong, Xinlong [1 ]
Shi, Peicheng [1 ]
Tang, Yueyue [1 ]
Yang, Li [1 ]
Yang, Aixi [2 ]
Liang, Taonian [3 ]
机构
[1] Anhui Polytech Univ, Sch Mech & Automot Engn, Wuhu 241000, Peoples R China
[2] Zhejiang Univ, Polytech Inst, Hangzhou 310015, Peoples R China
[3] Chery New Energy Automobile Co Ltd, Wuhu 241000, Peoples R China
来源
WORLD ELECTRIC VEHICLE JOURNAL | 2024年 / 15卷 / 08期
关键词
vehicle classification; vision transformer; local detail features; sparse attention module; contrast loss;
D O I
10.3390/wevj15080344
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature details and information will be lost. To solve this problem, we propose an improved vision transformer vehicle classification network (IND-ViT). Specifically, we first design a CNN-In D branch module to extract local features before image segmentation to make up for the loss of detail information in the vision transformer. Then, in order to solve the problem of misdetection caused by the large similarity of some vehicles, we propose a sparse attention module, which can screen out the discernible regions in the image and further improve the detailed feature representation ability of the model. Finally, this paper uses the contrast loss function to further increase the intra-class consistency and inter-class difference of classification features and improve the accuracy of vehicle classification recognition. Experimental results show that the accuracy of the proposed model on the datasets of vehicle classification BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101 is higher than that of the original vision transformer model. Respectively, it increased by 1.3%, 1.21%, 7.54%, and 3.60%; at the same time, it also met a certain real-time requirement to achieve a balance of accuracy and real time.
引用
收藏
页数:18
相关论文
共 35 条
  • [1] Convolutional Neural Network Based Vehicle Classification in Adverse Illuminous Conditions for Intelligent Transportation Systems
    Butt, Muhammad Atif
    Khattak, Asad Masood
    Shafique, Sarmad
    Hayat, Bashir
    Abid, Saima
    Kim, Ki-Il
    Ayub, Muhammad Waqas
    Sajid, Ahthasham
    Adnan, Awais
    [J]. COMPLEXITY, 2021, 2021
  • [2] Road Vehicle Classification using Support Vector Machines
    Chen, Zezhi
    Pears, Nick
    Freeman, Michael
    Austin, Jim
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 4, 2009, : 214 - +
  • [3] Visformer: The Vision-friendly Transformer
    Chen, Zhengsu
    Xie, Lingxi
    Niu, Jianwei
    Liu, Xuefeng
    Wei, Longhui
    Tian, Qi
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 569 - 578
  • [4] ConViT: improving vision transformers with soft convolutional inductive biases
    d'Ascoli, Stephane
    Touvron, Hugo
    Leavitt, Matthew L.
    Morcos, Ari S.
    Biroli, Giulio
    Sagun, Levent
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
  • [5] Enhanced Object Detection in Autonomous Vehicles through LiDAR-Camera Sensor Fusion
    Dai, Zhongmou
    Guan, Zhiwei
    Chen, Qiang
    Xu, Yi
    Sun, Fengyi
    [J]. WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
  • [6] Swin transformer based vehicle detection in undisciplined traffic environment
    Deshmukh, Prashant
    Satyanarayana, G. S. R.
    Majhi, Sudhan
    Sahoo, Upendra Kumar
    Das, Santos Kumar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [7] Deshpande S., 2017, Computer Vision and Imaging in Intelligent Transportation Systems, P47
  • [8] CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
    Dong, Xiaoyi
    Bao, Jianmin
    Chen, Dongdong
    Zhang, Weiming
    Yu, Nenghai
    Yuan, Lu
    Chen, Dong
    Guo, Baining
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12114 - 12124
  • [9] Dosovitskiy A., 2010, arXiv
  • [10] LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
    Graham, Ben
    El-Nouby, Alaaeldin
    Touvron, Hugo
    Stock, Pierre
    Joulin, Armand
    Jegou, Herve
    Douze, Matthijs
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12239 - 12249