Tunnel crack segmentation based on lightweight Transformer

被引：0

作者：

Kuang, Xianyan ^{[1
]}

Xu, Yaoming ^{[1
]}

Lei, Hui ^{[1
]}

Cheng, Fujun ^{[1
]}

Huan, Xianglan ^{[1
]}

机构：

[1] School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou

来源：

Journal of Railway Science and Engineering | 2024年 / 21卷 / 08期

关键词：

atrous spatial pyramid pool; crack segmentation; lightweight model; MobileViT; Transformer;

D O I：

10.19713/j.cnki.43-1423/u.T20231768

中图分类号：

学科分类号：

摘要：

Crack detection is crucial to ensuring the safety of the tunnel structure, and the timely detection of tunnel crack defects is conducive to reducing the project maintenance cost and guaranteeing traffic safety. However, the traditional convolutional neural network in tunnel crack detection tasks mainly focuses on improving detection accuracy and algorithm complexity. How to balance accuracy and real-time crack detection is a difficult point in the current research. To address this problem, this paper proposed a crack segmentation method called CrackViT based on a lightweight Transformer. First, the MobileViT network, which is a hybrid of convolutional neural networks and Transformer, was used to construct a crack feature extraction network. It reduced the parameters of the network model and the amount of computation and efficiently extracts the global information and the local feature information of the crack image. Then, an improved atrous spatial pyramid pooling decoder was proposed to realize feature extraction and information fusion at different scales and achieve pixel-level probability distribution. Meanwhile, the crack image suffers from the problem of missing detail information, and an efficient channel attention module was introduced to enhance the extraction ability of the crack feature information. In addition, for the problem of imbalance between crack and background categories, an online difficult sample mining loss function was designed to mitigate it. The experimental results show that the CrackViT algorithm finally achieves 75.62% IoU on the crack dataset with 63 FPS on a single 3050Ti GPU, with a model parameter count of only 2.43 M. The CrackViT-L model accuracy IoU is 76.83%, with a model parameter count of 3.56 M, and the model inference speed reaches 61 FPS. The algorithm’ s tested accuracy is better than most mainstream models and requires fewer model parameters. The results show that the edges of the tunnel crack segmentation images predicted by CrackViT are clearer and more complete, and the cracks can be effectively detected while maintaining the inference speed, which makes the algorithm useful for practical applications in tunnel crack detection. © 2024, Central South University Press. All rights reserved.

引用

页码：3421 / 3433

页数：12

共 24 条

[1] TANG Qianlong, TAN Yuan, PENG Limin, Et al., On crack identification method for tunnel linings based on digital image technology[J], Journal of Railway Science and Engineering, 16, 12, (2019)
[2] KULKARNI S, SINGH S, BALAKRISHNAN D, Et al., CrackSeg9k: a collection and benchmark for crack segmentation datasets and frameworks[C], European Conference on Computer Vision, pp. 179-195, (2023)
[3] BADRINARAYANAN V, KENDALL A, CIPOLLA R, Et al., SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 12, pp. 2481-2495, (2017)
[4] EHTISHAM R, MIR J, CHAIRMAN N, Et al., Evaluation of pre-trained ResNet and MobileNetV2 CNN models for the concrete crack detection and crack orientation classification[C], Proceedings of the 1st International Conference on Advances in Civil and Environmental Engineering, Taxila Pakistan, pp. 22-23, (2022)
[5] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16x16 words: transformers for image recognition at scale, (2020)
[6] MEHTA S, RASTEGARI M., MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer, (2021)
[7] CAI Han, LI Junyan, HU Muyan, Et al., EfficientViT: lightweight multi-scale attention for high-resolution dense prediction[C], 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 17256-17267, (2023)
[8] SUN Xinzi, XIE Yuanchang, JIANG Liming, Et al., DMA- net: DeepLab with multi-scale attention for pavement crack segmentation[J], IEEE Transactions on Intelligent Transportation Systems, 23, 10, pp. 18392-18403, (2022)
[9] PAN Huihui, HONG Yuanduo, SUN Weichao, Et al., Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J], IEEE Transactions on Intelligent Transportation Systems, 24, 3, (2023)
[10] XIE Enze, WANG Wenhai, YU Zhiding, Et al., SegFormer: simple and efficient design for semantic segmentation with transformers, (2021)

← 1 2 3 →