A feature-enhanced hybrid attention network for traffic sign recognition in real scenes

被引:1
作者
He, Lewei [1 ,2 ]
Lan, Fucai [1 ]
Zhou, Chuanzhe [1 ,3 ]
Ye, Yaoguang [1 ]
Zhang, Wencong [1 ]
Chen, Bingzhi [1 ]
Pan, Jiahui [1 ,4 ]
机构
[1] South China Normal Univ, Sch Software, Guangzhou, Peoples R China
[2] South China Normal Univ, Math Postdoctoral Res Stn, Guangzhou, Peoples R China
[3] Towngas Energy Acad, Shenzhen, Peoples R China
[4] South China Normal Univ, Sch Software, Foshan 528225, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; deep learning; object detection; traffic sign recognition; ENVIRONMENT;
D O I
10.1049/ipr2.13083
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, traffic sign recognition techniques have been brought into the assistive driving of automobiles. However, small traffic sign recognition in real scenes is still a challenging task due to the class imbalance issue and the size limit of the traffic signs. To address the above issues, a feature-enhanced hybrid attention network is proposed based on YOLOv5s for a small, fast, and accurate traffic sign detector. First, a series of online data augmentation strategies are designed in the preprocessing module for the model training. Second, the hybrid channel and spatial attention module CSAM are integrated into the backbone for a better feature extraction ability. Third, the channel attention module CAM is used in the detection head for a more efficient feature fusion ability. To validate the approach, extensive experiments are conducted based on the Tsinghua-Tencent 100K dataset. It is found that the novel method achieves state-of-the-art performance with only negligible increases in the model parameter and computational overhead. Specifically, the mAP@0.5$mAP@0.5$, parameters, and FLOPs are 85.8%, 7.13 M, and 16.1 G, respectively. We have developed a series of effective online data augmentation strategies for the traffic sign recognition dataset, which are able to improve the model performance without any extra computational overhead during the prediction process; To enhance the feature extraction ability of the backbone almost without extra model complexity, we have developed an efficient CSAM module placed at the beginning of the backbone, with the help of the hybrid channel and spatial attention mechanism and the residual bottleneck structure; To make better use of the features extracted by the backbone, we combined the channel attention module CAM with the feature pyramid network (FPN) and path aggregation network (PAN) structure for a multi-scale attention feature fusion detection head. image
引用
收藏
页码:2064 / 2077
页数:14
相关论文
共 52 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   Past, Present, and Future of Face Recognition: A Review [J].
Adjabi, Insaf ;
Ouahabi, Abdeldjalil ;
Benzaoui, Amir ;
Taleb-Ahmed, Abdelmalik .
ELECTRONICS, 2020, 9 (08) :1-53
[3]  
Alam A., 2021, J. Inform. Electr. Electron. Eng. (JIEEE), V2, P1, DOI DOI 10.54060/JIEEE/002.02.003
[4]   Grad-CAM plus plus : Generalized Gradient-based Visual Explanations for Deep Convolutional Networks [J].
Chattopadhay, Aditya ;
Sarkar, Anirban ;
Howlader, Prantik ;
Balasubramanian, Vineeth N. .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :839-847
[5]   High-Resolution Vehicle Trajectory Extraction and Denoising From Aerial Videos [J].
Chen, Xinqiang ;
Li, Zhibin ;
Yang, Yongsheng ;
Qi, Lei ;
Ke, Ruimin .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (05) :3190-3202
[6]   Road traffic sign detection and classification [J].
delaEscalera, A ;
Moreno, LE ;
Salichs, MA ;
Armingol, JM .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 1997, 44 (06) :848-859
[7]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[8]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[9]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[10]   Traffic Sign Recognition Based on the YOLOv3 Algorithm [J].
Gong, Chunpeng ;
Li, Aijuan ;
Song, Yumin ;
Xu, Ning ;
He, Weikai .
SENSORS, 2022, 22 (23)