RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection

被引：3

作者：

Phaphuangwittayakul, Aniwat ^{[1
,2
]}

Harnpornchai, Napat ^{[3
]}

Ying, Fangli ^{[4
]}

Zhang, Jinming ^{[1
]}

机构：

[1] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand

[2] Lancang Mekong Digital Intelligence Shijiazhuang T, Shijiazhuang 051230, Peoples R China

[3] Chiang Mai Univ, Fac Econ, Chiang Mai 50200, Thailand

[4] East China Univ Sci & Technol, Dept Comp Sci & Engn, State Key Lab Bioreactor Engn, Shanghai 200237, Peoples R China

来源：

JOURNAL OF IMAGING | 2024年 / 10卷 / 08期

关键词：

railway track inspection; vision transformer; computer vision; transportation safety; public transportation monitoring;

D O I：

10.3390/jimaging10080192

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Railway track defects pose significant safety risks and can lead to accidents, economic losses, and loss of life. Traditional manual inspection methods are either time-consuming, costly, or prone to human error. This paper proposes RailTrack-DaViT, a novel vision transformer-based approach for railway track defect classification. By leveraging the Dual Attention Vision Transformer (DaViT) architecture, RailTrack-DaViT effectively captures both global and local information, enabling accurate defect detection. The model is trained and evaluated on multiple datasets including rail, fastener and fishplate, multi-faults, and ThaiRailTrack. A comprehensive analysis of the model's performance is provided including confusion matrices, training visualizations, and classification metrics. RailTrack-DaViT demonstrates superior performance compared to state-of-the-art CNN-based methods, achieving the highest accuracies: 96.9% on the rail dataset, 98.9% on the fastener and fishplate dataset, and 98.8% on the multi-faults dataset. Moreover, RailTrack-DaViT outperforms baselines on the ThaiRailTrack dataset with 99.2% accuracy, quickly adapts to unseen images, and shows better model stability during fine-tuning. This capability can significantly reduce time consumption when applying the model to novel datasets in practical applications.

引用

页数：27

共 43 条

[1]

Adnan A., Railway Track Fault Detection, Dataset2 (Fastener)

[2]

Amin F., 2022, J. Eng. Res, V6, P1, DOI 10.21608/erjeng.2022.274526

[3]

Aslan M.F., 2023, Proc. Int. Conf. New Trends Appl. Sci, V1, P31

[4]

Ba J, 2014, ACS SYM SER

[5]

Baek S, 2022, Arxiv, DOI arXiv:2207.00234

[6] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[7]

Clevert D.-A., 2015, INT C LEARN REPR ICL

[8]

De Ruvo P., 2008, Open Cybern. Syst. J, V2, P57

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10] DaViT: Dual Attention Vision Transformers [J].

Ding, Mingyu ;

Xiao, Bin ;

Codella, Noel ;

Luo, Ping ;

Wang, Jingdong ;

Yuan, Lu .

COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 :74-92

← 1 2 3 4 5 →