Probing vision and language models for construction waste material recognition

被引:1
|
作者
Sun, Ying [1 ,2 ]
Gu, Zhaolin [1 ]
Yang, Sean Bin [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Human Settlement & Civil Engn, Xian 710049, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[3] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
关键词
Automatic sorting system; Vision and language models; Bidirectional contrastive training; Construction material recognition;
D O I
10.1016/j.autcon.2024.105629
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Motivated by the critical role of automatic sorting in construction waste management, recent advancements have leveraged deep learning's ability to capture powerful features within unimodality-based recognition approaches. However, existing methods remain limited due to reliance on solely image-based datasets, restricting feature expression. To solve this, this paper introduces the VL-CSW dataset by considering both image and text modalities. Next, this paper proposes ConCLIP, , a vision-and-language model tailored for CSW recognition. ConCLIP incorporates a pre-feature interaction network for enhanced modality-specific feature learning and leverages a bidirectional contrastive training paradigm alongside supervised task training to optimize its performance across both modalities. Evaluation on VL-CSW datasets demonstrates the ConCLIP's 's superiority on CSW material classification task, significantly outperforming strong baselines in most settings. Notably, ConCLIP achieves performance improvements of 1.83% and 3.41% compared to unimodality methods in VL-Concrete and VL-Metal classification tasks, respectively, highlighting the efficacy of multi-modality in enhancing automatic sorting system performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
    Thrush, Tristan
    Jiang, Ryan
    Bartolo, Max
    Singh, Amanpreet
    Williams, Adina
    Kiela, Douwe
    Ross, Candace
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5228 - 5238
  • [2] Benchmarking computer vision models for automated construction waste sorting
    Dong, Zhiming
    Yuan, Liang
    Yang, Bing
    Xue, Fan
    Lu, Weisheng
    RESOURCES CONSERVATION AND RECYCLING, 2025, 213
  • [3] Probing the link between vision and language in material perception using psychophysics and unsupervised learning
    Liao, Chenxi
    Sawayama, Masataka
    Xiao, Bei
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (10)
  • [4] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wenhao Wu
    Zhun Sun
    Yuxin Song
    Jingdong Wang
    Wanli Ouyang
    International Journal of Computer Vision, 2024, 132 (2) : 392 - 409
  • [5] Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
    Wu, Wenhao
    Sun, Zhun
    Ouyang, Wanli
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2847 - 2855
  • [6] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wu, Wenhao
    Sun, Zhun
    Song, Yuxin
    Wang, Jingdong
    Ouyang, Wanli
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 392 - 409
  • [7] Open-Set Recognition in the Age of Vision-Language Models
    Miller, Dimity
    Sunderhauf, Niko
    Kenna, Alex
    Mason, Keita
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 1 - 18
  • [8] From construction waste to construction material
    Hohla, M
    ENVIRONMENTAL GEOTECHNICS, VOLS 1-4, 1998, : 773 - 775
  • [9] Image Analysis Method of Construction Waste Filler Material Components Based on Machine Vision
    Xie K.
    Chen X.
    Yao J.
    Su Q.
    Chen L.
    Wu M.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2021, 49 (10): : 50 - 58and69
  • [10] Teaching Structured Vision & Language Concepts to Vision & Language Models
    Doveh, Sivan
    Arbelle, Assaf
    Harary, Sivan
    Schwartz, Eli
    Herzig, Roei
    Giryes, Raja
    Feris, Rogerio
    Panda, Rameswar
    Ullman, Shimon
    Karlinsky, Leonid
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2657 - 2668