VisualSiteDiary: A detector-free Vision-Language Transformer model for captioning photologs for daily construction reporting and image retrievals

被引：5

作者：

Jung, Yoonhwa ^{[1
]}

Cho, Ikhyun ^{[2
]}

Hsu, Shun-Hsiang ^{[3
]}

Golparvar-Fard, Mani ^{[4
]}

机构：

[1] Univ Illinois, Dept Civil & Environm Engn & Comp Sci, Urbana, IL 61801 USA

[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[3] Univ Illinois, Dept Civil & Environm Engn, Urbana, IL 61801 USA

[4] Univ Illinois, Dept Civil & Environm Engn, Comp Sci & Tech Entrepreneurship, Urbana, IL 61801 USA

来源：

AUTOMATION IN CONSTRUCTION | 2024年 / 165卷

基金：

美国国家科学基金会;

关键词：

Computer vision; Project controls; Image captioning; Project management; Machine learning; Natural language generation; Artificial intelligence; SAFETY;

D O I：

10.1016/j.autcon.2024.105483

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

This paper presents VisualSiteDiary, a Vision Transformer -based image captioning model which creates humanreadable captions for daily progress and work activity log, and enhances image retrieval tasks. As a model for deciphering construction photologs, VisualSiteDiary incorporates pseudo -region features, utilizes high-level knowledge in pretraining, and fine-tunes for diverse captioning styles. To validate VisualSiteDiary, a new image captioning dataset, VSD, is presented. This dataset includes many realistic yet challenging cases commonly observed in commercial building projects. Experimental results using five different metrics demonstrate that VisualSiteDiary provides superior -quality captions compared to the state-of-the-art image captioning models. Excluding the task of object recognition, the presented model also outperformed mPLUG -the state-of-the-art visual -language model- in the image retrieval task by 0.6% in precision and 0.9% in recall, respectively. Detailed discussions illustrate practical examples on how VisualSiteDiary improves the process of creating daily construction reports, paving the way for future developments in the field.

引用

页数：19

共 84 条

[1]

Alikhani H., 2020, CREATIVE CONSTRUCTIO, DOI 10.3311/CCC2020-039

[2] Construction schedule augmentation with implicit dependency constraints and automated generation of lookahead plan revisions [J].

Amer, Fouad ;

Jung, Yoonhwa ;

Golparvar-Fard, Mani .

AUTOMATION IN CONSTRUCTION, 2023, 152

[3] Transformer machine learning language model for auto-alignment of long-term and short-term plans in construction [J].

Amer, Fouad ;

Jung, Yoonhwa ;

Golparvar-Fard, Mani .

AUTOMATION IN CONSTRUCTION, 2021, 132

[4] Modeling dynamic construction work template from existing scheduling records via sequential machine learning [J].

Amer, Fouad ;

Golparvar-Fard, Mani .

ADVANCED ENGINEERING INFORMATICS, 2021, 47

[5] SPICE: Semantic Propositional Image Caption Evaluation [J].

Anderson, Peter ;

Fernando, Basura ;

Johnson, Mark ;

Gould, Stephen .

COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398

[6] A critical review of text-based research in construction: Data source, analysis method, and implications [J].

Baek, Seungwon ;

Jung, Wooyong ;

Han, Seung H. .

AUTOMATION IN CONSTRUCTION, 2021, 132

[7]

Banerjee S., 2005, P ACL WORKSH INTR EX, P65, DOI DOI 10.3115/1626355.1626389

[8] Big Data in the construction industry: A review of present status, opportunities, and future trends [J].

Bilal, Muhammad ;

Oyedele, Lukumon O. ;

Qadir, Junaid ;

Munir, Kamran ;

Ajayi, Saheed O. ;

Akinade, Olugbenga O. ;

Owolabi, Hakeem A. ;

Alaka, Hafiz A. ;

Pasha, Maruf .

ADVANCED ENGINEERING INFORMATICS, 2016, 30 (03) :500-521

[9]

Bouamor H., 2023, Findings of the Association for Computational Linguistics: EMNLP 2023, P8535, DOI [10.18653/v1/2023.findings-emnlp.572, DOI 10.18653/V1/2023.FINDINGS-EMNLP.572]

[10] Construction site image retrieval based on material cluster recognition [J].

Brilakis, Ioannis K. ;

Soibelman, Lucio ;

Shinagawa, Yoshihisa .

ADVANCED ENGINEERING INFORMATICS, 2006, 20 (04) :443-452

← 1 2 3 4 5 6 7 8 9 →