Vehicle Classification Algorithm Based on Improved Vision Transformer

被引：2

作者：

Dong, Xinlong ^{[1
]}

Shi, Peicheng ^{[1
]}

Tang, Yueyue ^{[1
]}

Yang, Li ^{[1
]}

Yang, Aixi ^{[2
]}

Liang, Taonian ^{[3
]}

机构：

[1] Anhui Polytech Univ, Sch Mech & Automot Engn, Wuhu 241000, Peoples R China

[2] Zhejiang Univ, Polytech Inst, Hangzhou 310015, Peoples R China

[3] Chery New Energy Automobile Co Ltd, Wuhu 241000, Peoples R China

来源：

WORLD ELECTRIC VEHICLE JOURNAL | 2024年 / 15卷 / 08期

关键词：

vehicle classification; vision transformer; local detail features; sparse attention module; contrast loss;

D O I：

10.3390/wevj15080344

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature details and information will be lost. To solve this problem, we propose an improved vision transformer vehicle classification network (IND-ViT). Specifically, we first design a CNN-In D branch module to extract local features before image segmentation to make up for the loss of detail information in the vision transformer. Then, in order to solve the problem of misdetection caused by the large similarity of some vehicles, we propose a sparse attention module, which can screen out the discernible regions in the image and further improve the detailed feature representation ability of the model. Finally, this paper uses the contrast loss function to further increase the intra-class consistency and inter-class difference of classification features and improve the accuracy of vehicle classification recognition. Experimental results show that the accuracy of the proposed model on the datasets of vehicle classification BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101 is higher than that of the original vision transformer model. Respectively, it increased by 1.3%, 1.21%, 7.54%, and 3.60%; at the same time, it also met a certain real-time requirement to achieve a balance of accuracy and real time.

引用

页数：18

共 35 条

[1] Convolutional Neural Network Based Vehicle Classification in Adverse Illuminous Conditions for Intelligent Transportation Systems
Butt, Muhammad Atif
Khattak, Asad Masood
Shafique, Sarmad
Hayat, Bashir
Abid, Saima
Kim, Ki-Il
Ayub, Muhammad Waqas
Sajid, Ahthasham
Adnan, Awais
[J]. COMPLEXITY, 2021, 2021
[2] Road Vehicle Classification using Support Vector Machines
Chen, Zezhi
Pears, Nick
Freeman, Michael
Austin, Jim
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 4, 2009, : 214 - +
[3] Visformer: The Vision-friendly Transformer
Chen, Zhengsu
Xie, Lingxi
Niu, Jianwei
Liu, Xuefeng
Wei, Longhui
Tian, Qi
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 569 - 578
[4] ConViT: improving vision transformers with soft convolutional inductive biases
d'Ascoli, Stephane
Touvron, Hugo
Leavitt, Matthew L.
Morcos, Ari S.
Biroli, Giulio
Sagun, Levent
[J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
[5] Enhanced Object Detection in Autonomous Vehicles through LiDAR-Camera Sensor Fusion
Dai, Zhongmou
Guan, Zhiwei
Chen, Qiang
Xu, Yi
Sun, Fengyi
[J]. WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
[6] Swin transformer based vehicle detection in undisciplined traffic environment
Deshmukh, Prashant
Satyanarayana, G. S. R.
Majhi, Sudhan
Sahoo, Upendra Kumar
Das, Santos Kumar
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[7] Deshpande S., 2017, Computer Vision and Imaging in Intelligent Transportation Systems, P47
[8] CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Dong, Xiaoyi
Bao, Jianmin
Chen, Dongdong
Zhang, Weiming
Yu, Nenghai
Yuan, Lu
Chen, Dong
Guo, Baining
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12114 - 12124
[9] Dosovitskiy A., 2010, arXiv
[10] LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Graham, Ben
El-Nouby, Alaaeldin
Touvron, Hugo
Stock, Pierre
Joulin, Armand
Jegou, Herve
Douze, Matthijs
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12239 - 12249

← 1 2 3 4 →