Crack image classification and information extraction in steel bridges using multimodal large language models

被引：2

作者：

Wang, Xiao ^{[1
,2
]}

Yue, Qingrui ^{[1
,2
]}

Liu, Xiaogang ^{[1
]}

机构：

[1] Univ Sci & Technol Beijing, Res Inst Urbanizat & Urban Safety, Sch Future Cities, Beijing 100083, Peoples R China

[2] Tianjin Univ, Sch Civil Engn, Tianjin 300350, Peoples R China

来源：

AUTOMATION IN CONSTRUCTION | 2025年 / 171卷

基金：

中国国家自然科学基金;

关键词：

Steel bridge cracks; Multimodal large language models; Zero-shot detection; Deep-learning; Visual prompts;

D O I：

10.1016/j.autcon.2025.105995

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Existing deep learning methods fail to meet the requirements of zero-shot learning scenarios for crack detection and have yet to investigate the specific impact of visual prompts on the detection performance of multimodal large language models (MLLMs). This paper proposes a cascaded crack detection strategy based on MLLMs, decomposing the crack detection task into a stepwise classification process from the image level to patch level. The crack detection performance of five MLLMs and five traditional deep-learning models was systematically evaluated, while the influence of visual prompt quantity and design on model performance was examined. The results indicate that MLLMs achieve performance comparable to deep learning models in image-level crack detection. However, in finer-grained patch-level crack detection, their performance still needs to catch up to that achieved by deep learning models based on Segmented Transformer. Increasing the number of visual prompts can partially improve the classification performance of MLLMs.

引用

页数：17

共 71 条

[1]

[Anonymous], 2023, GPT-4 Technical Report, DOI DOI 10.48550/ARXIV.2303.08774

[2] Autonomous chemical research with large language models [J].

Boiko, Daniil A. ;

Macknight, Robert ;

Kline, Ben ;

Gomes, Gabe .

NATURE, 2023, 624 (7992) :570-+

[3] Chatting about ChatGPT: How does ChatGPT 4.0 perform on the understanding and design of cementitious composite? [J].

Cai, Jingming ;

Yuan, Yujin ;

Sui, Xupeng ;

Lin, Yuanzheng ;

Zhuang, Ke ;

Xu, Yun ;

Zhang, Qian ;

Ukrainczyk, Neven ;

Xie, Tianyu .

CONSTRUCTION AND BUILDING MATERIALS, 2024, 425

[4] A vision-based method for crack detection in gusset plate welded joints of steel bridges using deep convolutional neural networks [J].

Cao Vu Dung ;

Sekiya, Hidehiko ;

Hirano, Suichi ;

Okatani, Takayuki ;

Miki, Chitoshi .

AUTOMATION IN CONSTRUCTION, 2019, 102 :217-229

[5] A texture-Based Video Processing Methodology Using Bayesian Data Fusion for Autonomous Crack Detection on Metallic Surfaces [J].

Chen, Fu-Chen ;

Jahanshahi, Mohammad R. ;

Wu, Rih-Teng ;

Joffe, Chris .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2017, 32 (04) :271-287

[6] Augmented reality, deep learning and vision-language query system for construction worker safety [J].

Chen, Haosen ;

Hou, Lei ;

Wu, Shaoze ;

Zhang , Guomin ;

Zou, Yang ;

Moon, Sungkon ;

Bhuiyan, Muhammed .

AUTOMATION IN CONSTRUCTION, 2024, 157

[7] Contrastive learning of defect prototypes under natural language supervision [J].

Cheng, Huyue ;

Jiang, Hongquan ;

Yan, Haobo ;

Zhang, Wanjun .

ADVANCED ENGINEERING INFORMATICS, 2024, 62

[8] Vision-based monitoring of site safety compliance based on worker re-identification and personal protective equipment classification [J].

Cheng, Jack C. P. ;

Wong, Peter Kok-Yiu ;

Luo, Han ;

Wang, Mingzhu ;

Leung, Pak Him .

AUTOMATION IN CONSTRUCTION, 2022, 139

[9] Implementation of explanatory texts output for bridge damage in a bridge inspection web system [J].

Chun, Pang-jo ;

Chu, Honghu ;

Shitara, Kota ;

Yamane, Tatsuro ;

Maemura, Yu .

ADVANCES IN ENGINEERING SOFTWARE, 2024, 195

[10] A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage [J].

Chun, Pang-Jo ;

Yamane, Tatsuro ;

Maemura, Yu .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (11) :1387-1401

← 1 2 3 4 5 6 7 8 →