Probing vision and language models for construction waste material recognition

被引:1
|
作者
Sun, Ying [1 ,2 ]
Gu, Zhaolin [1 ]
Yang, Sean Bin [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Human Settlement & Civil Engn, Xian 710049, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[3] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
关键词
Automatic sorting system; Vision and language models; Bidirectional contrastive training; Construction material recognition;
D O I
10.1016/j.autcon.2024.105629
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Motivated by the critical role of automatic sorting in construction waste management, recent advancements have leveraged deep learning's ability to capture powerful features within unimodality-based recognition approaches. However, existing methods remain limited due to reliance on solely image-based datasets, restricting feature expression. To solve this, this paper introduces the VL-CSW dataset by considering both image and text modalities. Next, this paper proposes ConCLIP, , a vision-and-language model tailored for CSW recognition. ConCLIP incorporates a pre-feature interaction network for enhanced modality-specific feature learning and leverages a bidirectional contrastive training paradigm alongside supervised task training to optimize its performance across both modalities. Evaluation on VL-CSW datasets demonstrates the ConCLIP's 's superiority on CSW material classification task, significantly outperforming strong baselines in most settings. Notably, ConCLIP achieves performance improvements of 1.83% and 3.41% compared to unimodality methods in VL-Concrete and VL-Metal classification tasks, respectively, highlighting the efficacy of multi-modality in enhancing automatic sorting system performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Rectify representation bias in vision-language models for long-tailed recognition
    Li, Bo
    Yao, Yongqiang
    Tan, Jingru
    Gong, Ruihao
    Lu, Jianwei
    Luo, Ye
    NEURAL NETWORKS, 2024, 172
  • [22] Temporal Modeling Approach for Video Action Recognition Based on Vision-language Models
    Huang, Yue
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT III, 2024, 14449 : 512 - 523
  • [23] What Do Language Models Hear? Probing for Auditory Representations in Language Models
    Ngo, Jerry
    Kim, Yoon
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5435 - 5448
  • [24] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [25] Analyzing the Robustness of Vision & Language Models
    Shirnin, Alexander
    Andreev, Nikita
    Potapova, Sofia
    Artemova, Ekaterina
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2751 - 2763
  • [26] Construction and Practical Significance of Language World Vision
    Wei, Yan-li
    PROCEEDINGS OF THE SEVENTH NORTHWAST ASIA INTERNATIONAL SYMPOSIUM ON LANGUAGE, LITERATURE AND TRANSLATION, 2018, : 488 - 494
  • [27] Probing for Bridging Inference in Transformer Language Models
    Pandit, Onkar
    Hou, Yufang
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4153 - 4163
  • [28] Probing Pretrained Language Models for Lexical Semantics
    Vulie, Ivan
    Ponti, Edoardo M.
    Litschko, Robert
    Glava, Goran
    Korhonen, Anna
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7222 - 7240
  • [29] Construction of a knowledge graph for framework material enabled by large language models and its application
    Bai, Xuefeng
    He, Song
    Li, Yi
    Xie, Yabo
    Zhang, Xin
    Du, Wenli
    Li, Jian-Rong
    NPJ COMPUTATIONAL MATERIALS, 2025, 11 (01)
  • [30] Probing Pretrained Language Models with Hierarchy Properties
    Lovon-Melgarejo, Jesus
    Moreno, Jose G.
    Besancon, Romaric
    Ferret, Olivier
    Tamine, Lynda
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 126 - 142