Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

被引:17
|
作者
Kauf, Carina [1 ,2 ,8 ]
Ivanova, Anna A. [1 ,2 ,3 ]
Rambelli, Giulia [4 ]
Chersoni, Emmanuele [5 ]
She, Jingyuan Selena [1 ,2 ]
Chowdhury, Zawad [6 ]
Fedorenko, Evelina [1 ,2 ]
Lenci, Alessandro [7 ]
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA USA
[2] MIT, McGovern Inst Brain Res, Cambridge, MA USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA USA
[4] Univ Bologna, Dept Modern Languages Literatures & Cultures, Bologna, Italy
[5] Hong Kong Polytech Univ, Dept Chinese & Bilingual Studies, Hong Kong, Peoples R China
[6] Univ Washington, Dept Math, Seattle, WA USA
[7] Univ Pisa, Dept Philol Literature & Linguist, Pisa, Italy
[8] MIT, Dept Brain & Cognit Sci, 43 Vassar St, Cambridge, MA 02139 USA
关键词
Generalized event knowledge; World knowledge; Plausibility; Typicality; Artificial neural networks; Language models; Syntax; Semantics; EYE-MOVEMENTS; PREDICTION; INTEGRATION; VERBS; REPRESENTATION; PERCEPTION; VIOLATIONS; MEMORY;
D O I
10.1111/cogs.13386
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.
引用
收藏
页数:40
相关论文
共 50 条
  • [21] Transformers and large language models in healthcare: A review
    Nerella, Subhash
    Bandyopadhyay, Sabyasachi
    Zhang, Jiaqing
    Contreras, Miguel
    Siegel, Scott
    Bumin, Aysegul
    Silva, Brandon
    Sena, Jessica
    Shickel, Benjamin
    Bihorac, Azra
    Khezeli, Kia
    Rashidi, Parisa
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [22] Inductive reasoning in humans and large language models
    Han, Simon Jerome
    Ransom, Keith J.
    Perfors, Andrew
    Kemp, Charles
    COGNITIVE SYSTEMS RESEARCH, 2024, 83
  • [23] NetLLM: Adapting Large Language Models for Networking
    Wu, Duo
    Wang, Xianda
    Qiao, Yaqi
    Wang, Zhi
    Jiang, Junchen
    Cui, Shuguang
    Wang, Fangxin
    PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 661 - 678
  • [24] Diagnostic accuracy of large language models in psychiatry
    Gargari, Omid Kohandel
    Fatehi, Farhad
    Mohammadi, Ida
    Firouzabadi, Shahryar Rajai
    Shafiee, Arman
    Habibi, Gholamreza
    ASIAN JOURNAL OF PSYCHIATRY, 2024, 100
  • [25] Large language models (LLMs) and the institutionalization of misinformation
    Garry, Maryanne
    Chan, Way Ming
    Foster, Jeffrey
    Henkel, Linda A.
    TRENDS IN COGNITIVE SCIENCES, 2024, 28 (12) : 1078 - 1088
  • [26] BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights
    Remy, Francois
    Demuynck, Kris
    Demeester, Thomas
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024,
  • [27] Do Multimodal Large Language Models and Humans Ground Language Similarly?
    Jones, Cameron R.
    Bergen, Benjamin
    Trott, Sean
    COMPUTATIONAL LINGUISTICS, 2024, 50 (04) : 1415 - 1440
  • [28] Modeling Structure-Building in the Brain With CCG Parsing and Large Language Models
    Stanojevic, Milos
    Brennan, Jonathan R. R.
    Dunagan, Donald
    Steedman, Mark
    Hale, John T. T.
    COGNITIVE SCIENCE, 2023, 47 (07)
  • [29] Bridging the conversational gap in epilepsy: Using large language models to reveal insights into patient behavior and concerns from online discussions
    Fennig, Uriel
    Yom-Tov, Elad
    Savitsky, Leehe
    Nissan, Johnatan
    Altman, Keren
    Loebenstein, Roni
    Boxer, Marina
    Weinberg, Nitai
    Gofrit, Shany Guly
    Maggio, Nicola
    EPILEPSIA, 2025, 66 (03) : 686 - 699
  • [30] On the Relationship between Syntactic and Semantic Encoding in Metric Space Language Models
    Tabor, Whitney
    JOURNAL OF COGNITIVE SCIENCE, 2021, 22 (02) : 35 - 67