STrans-YOLOX: Fusing Swin Transformer and YOLOX for Automatic Pavement Crack Detection

被引:20
作者
Luo, Hui [1 ]
Li, Jiamin [1 ]
Cai, Lianming [1 ]
Wu, Mingquan [1 ]
机构
[1] East China Jiaotong Univ, Sch Informat Engn, Nanchang 330013, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
基金
中国国家自然科学基金;
关键词
pavement crack detection; object detection; Swin Transformer; YOLOX; global guidance attention; multi-scale feature fusion; NMS; complex scenes;
D O I
10.3390/app13031999
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic pavement crack detection is crucial for reducing road maintenance costs and ensuring transportation safety. Although convolutional neural networks (CNNs) have been widely used in automatic pavement crack detection, they cannot adequately model the long-range dependencies between pixels and easily lose edge detail information in complex scenes. Moreover, irregular crack shapes also make the detection task challenging. To address these issues, an automatic pavement crack detection architecture named STrans-YOLOX is proposed. Specifically, the architecture first exploits the CNN backbone to extract feature information, preserving the local modeling ability of the CNN. Then, Swin Transformer is introduced to enhance the long-range dependencies through a self-attention mechanism by supplying each pixel with global features. A new global attention guidance module (GAGM) is used to ensure effective information propagation in the feature pyramid network (FPN) by using high-level semantic information to guide the low-level spatial information, thereby enhancing the multi-class and multi-scale features of cracks. During the post-processing stage, we utilize alpha-IoU-NMS to achieve the accurate suppression of the detection boxes in the case of occlusion and overlapping objects by introducing an adjustable power parameter. The experiments demonstrate that the proposed STrans-YOLOX achieves 63.37% mAP and surpasses the state-of-the-art models on the challenging pavement crack dataset.
引用
收藏
页数:17
相关论文
共 35 条
[31]   Non-local Neural Networks [J].
Wang, Xiaolong ;
Girshick, Ross ;
Gupta, Abhinav ;
He, Kaiming .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7794-7803
[32]   Automated Asphalt Highway Pavement Crack Detection Based on Deformable Single Shot Multi-Box Detector Under a Complex Environment [J].
Yan, Kun ;
Zhang, Zhihua .
IEEE ACCESS, 2021, 9 :150925-150938
[33]   Feature Pyramid Transformer [J].
Zhang, Dong ;
Zhang, Hanwang ;
Tang, Jinhui ;
Wang, Meng ;
Hua, Xiansheng ;
Sun, Qianru .
COMPUTER VISION - ECCV 2020, PT XXVIII, 2020, 12373 :323-339
[34]  
Zhang L, 2016, IEEE IMAGE PROC, P3708, DOI 10.1109/ICIP.2016.7533052
[35]   TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios [J].
Zhu, Xingkui ;
Lyu, Shuchang ;
Wang, Xu ;
Zhao, Qi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :2778-2788