A weakly-supervised transformer-based hybrid network with multi-attention for pavement crack detection

被引:29
作者
Wang, Zhenlin [1 ]
Leng, Zhufei [2 ]
Zhang, Zhixin [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Mech & Elect Engn, Chengdu 611731, Sichuan, Peoples R China
关键词
Pavement crack segmentation; Semantic segmentation; Transformer; Weakly-supervised learning; Convolutional neural network (CNN); Deep learning; ARCHITECTURE;
D O I
10.1016/j.conbuildmat.2023.134134
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
At present, crack detection is of grand importance for the maintenance of infrastructure, one of which the most crucial kind in China is roads. Road safety accidents, which are mainly caused by cracks, have a significant influence on people's property, life security and the economic development of the society. Thus, it is essential to accurately identify the pavement defects and promptly repair them in order to prolong the lifespan of the road, minimize maintenance expenses, prevent further deterioration of the road and decrease the occurrence of hazards. In recent years, deep neural networks have achieved a huge degree of success in crack detection, resulting in substantial savings in terms of manpower, time and money when compared to conventional approaches. Nevertheless, owing to numerous difficulties, including time-consuming pixel annotation, inadequacy in acquiring information, discontinuous cracks and low-quality images, the detection of pavement defects remains a great challenge, still having some tricky issues demanding fabulous solutions. To this end, we propose a novel Weakly-Supervised hybrid network with multi-attention, termed CGTr-Net, for pavement crack detection. Aiming at alleviating the loss of information, behaving well in extracting both local and global features, the architecture of the backbone CG-Trans was designed. It is a combination of Convolutional Neural Network (CNN), which is expert in extracting local features but experiencing difficulties to capture global representations, and Gated axial Transformer, whose gated position-sensitive axial attention mechanism can efficiently extract long-distance feature dependencies but deteriorate in capturing local feature details. To enhance feature fusion between the Transformer Layer and the Convolution Layer, a feature fusion module (TCFF) was added to this network. The two feature maps obtained from Transformer and CNN are utilized to generate Grad-CAM. Subsequently, we use Conditional Random Field (CRF) to further refine the Grad-CAM and adapt Affinity from Attention (AFA), which learn semantic affinity from the Gated Axial Transformer and the Convolutional Neural Network, to produce more accurate pseudo labels. The proposed CGTr-Net is evaluated on two different crack segmentation datasets and our CGTr-Net achieves the highest scores of Recall (Re), F-score (F1) and the mean intersection-over-union (mIoU) on the two benchmark datasets, surpassing all the competitors in the experiment. These results demonstrate the robustness, effectiveness and the superiority of our CGTr-Net compared with existing state-of-the-art methods.
引用
收藏
页数:11
相关论文
共 57 条
[1]   Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations [J].
Ahn, Jiwoon ;
Cho, Sunghyun ;
Kwak, Suha .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2204-2213
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]  
Beal Josh, 2020, arXiv
[4]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[5]  
Brohan A, 2023, arXiv, DOI [DOI 10.48550/ARXIV, https://doi.org/10.48550/arXiv, 10.48550/ARXIV.]
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]  
Chen L.-C., 2018, COMP VIS ECCV 2018
[8]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[9]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[10]   External Attention Based TransUNet and Label Expansion Strategy for Crack Detection [J].
Fang, Jie ;
Yang, Chen ;
Shi, Yuetian ;
Wang, Nan ;
Zhao, Yang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) :19054-19063