Lite transformer with medium self attention for efficient traffic sign recognition

被引：0

作者：

Xiao, Junbi ^{[1
]}

Zhang, Qi ^{[1
]}

Gong, Wenjuan ^{[1
]}

Liu, Jianhang ^{[1
]}

机构：

[1] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Peoples R China

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2025年 / 111卷

基金：

中国国家自然科学基金;

关键词：

Autonomous driving systems; Traffic sign recognition; Lightweight transformer; Attention; NETWORK;

D O I：

10.1016/j.jvcir.2025.104502

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.

引用

页数：14

共 64 条

[1] Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods [J].

Arcos-Garcia, Alvaro ;

Alvarez-Garcia, Juan A. ;

Soria-Morillo, Luis M. .

NEURAL NETWORKS, 2018, 99 :158-165

[2]

Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4]

Chen C.-F., 2021, arXiv

[5] Mobile-Former: Bridging MobileNet and Transformer [J].

Chen, Yinpeng ;

Dai, Xiyang ;

Chen, Dongdong ;

Liu, Mengchen ;

Dong, Xiaoyi ;

Yuan, Lu ;

Liu, Zicheng .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5260-5269

[6] DPT: Deformable Patch-based Transformer for Visual Recognition [J].

Chen, Zhiyang ;

Zhu, Yousong ;

Zhao, Chaoyang ;

Hu, Guosheng ;

Zeng, Wei ;

Wang, Jinqiao ;

Tang, Ming .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :2899-2907

[7]

Child R, 2019, Arxiv, DOI [arXiv:1904.10509, 10.48550/arXiv.1904.10509, DOI 10.48550/ARXIV.1904.10509]

[8] StarGAN v2: Diverse Image Synthesis for Multiple Domains [J].

Choi, Yunjey ;

Uh, Youngjung ;

Yoo, Jaejun ;

Ha, Jung-Woo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8185-8194

[9]

Chu XX, 2021, ADV NEUR IN

[10]

Cintas Celia, 2019, arXiv

← 1 2 3 4 5 6 7 →