Probing the link between vision and language in material perception using psychophysics and unsupervised learning

被引:0
|
作者
Liao, Chenxi [1 ]
Sawayama, Masataka [2 ]
Xiao, Bei [3 ]
机构
[1] Amer Univ, Dept Neurosci, Washington, DC 20016 USA
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
[3] Amer Univ, Dept Comp Sci, Washington, DC USA
基金
美国国家卫生研究院;
关键词
REPRESENTATION; COLOR; GLOSS;
D O I
10.1371/journal.pcbi.1012481
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models. Materials are building blocks of our environment, granting access to a wide array of visual experiences. The immense diversity, complexity, and versatility of materials present challenges in verbal articulation. To what extent can words convey the richness of visual material perception? What are the salient attributes for communicating about materials? We address these questions by measuring both visual material similarity judgments and free-form verbal descriptions. We use AI models to create a diverse array of plausible visual appearances of familiar and unfamiliar materials. Our findings reveal a moderate vision-language correlation within individual participants, yet a notable discrepancy persists between the two modalities. While verbal descriptions capture material qualities at a coarse categorical level, precise alignment between vision and language at the individual stimulus level is still lacking. These results highlight that visual representations of materials are richer than verbalized semantic features, underscoring the differential roles of language and vision in perception. Lastly, we discover that deep neural networks pre-trained on large-scale datasets can predict human visual similarities at a coarse level, suggesting the general visual representations learned by these networks carry perceptually relevant information for material-relevant tasks.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Probing vision and language models for construction waste material recognition
    Sun, Ying
    Gu, Zhaolin
    Yang, Sean Bin
    AUTOMATION IN CONSTRUCTION, 2024, 166
  • [2] Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective
    Salin, Emmanuelle
    Farah, Badreddine
    Ayache, Stephane
    Favre, Benoit
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11248 - 11257
  • [3] Is Artificial Intelligence a "Learning Material" in Higher Education? Students Perception on Using Artificial Intelligence on Language Learning
    Bantoto, Franchesca Marie
    Flores, Berhana
    Jindal, Mayank
    DIGITAL TECHNOLOGIES AND APPLICATIONS, ICDTA 2024, VOL 2, 2024, 1099 : 56 - 63
  • [4] MATERIAL CHARACTERIZATION BY ULTRASONICS USING UNSUPERVISED COMPETITIVE LEARNING
    CHOU, CP
    HO, B
    SHEU, JT
    PATTERN RECOGNITION LETTERS, 1995, 16 (07) : 769 - 777
  • [5] Probing the Link Between Perception and Oscillations: Lessons from Transcranial Alternating Current Stimulation
    Cabral-Calderin, Yuranny
    Wilke, Melanie
    NEUROSCIENTIST, 2020, 26 (01): : 57 - 73
  • [7] A Cognitive Approach for Robots' Vision Using Unsupervised Learning and Visual Saliency
    Ramik, Dominik M.
    Sabourin, Christophe
    Madani, Kurosh
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2011, PT I, 2011, 6691 : 81 - 88
  • [8] Vision to Language: Captioning Images using Deep Learning
    Charu, Shreyasi
    Mishra, S. P.
    Gandhi, Tapan
    2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2020,
  • [9] Vision-language constraint graph representation learning for unsupervised vehicle re-identification
    Wang, Dong
    Wang, Qi
    Tu, Zhiwei
    Min, Weidong
    Xiong, Xin
    Zhong, Yuling
    Gai, Di
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [10] Students' Perception of Using Online Language Learning Materials
    Zamari, Zarlina Mohd
    Adnan, Airil Haimi Mohd
    Idris, Sheema Liza
    Yusof, Johana
    3RD INTERNATIONAL CONFERENCE ON E-LEARNING (ICEL 2011), 2012, 67 : 611 - 620