Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

被引:17
|
作者
Kauf, Carina [1 ,2 ,8 ]
Ivanova, Anna A. [1 ,2 ,3 ]
Rambelli, Giulia [4 ]
Chersoni, Emmanuele [5 ]
She, Jingyuan Selena [1 ,2 ]
Chowdhury, Zawad [6 ]
Fedorenko, Evelina [1 ,2 ]
Lenci, Alessandro [7 ]
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA USA
[2] MIT, McGovern Inst Brain Res, Cambridge, MA USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA USA
[4] Univ Bologna, Dept Modern Languages Literatures & Cultures, Bologna, Italy
[5] Hong Kong Polytech Univ, Dept Chinese & Bilingual Studies, Hong Kong, Peoples R China
[6] Univ Washington, Dept Math, Seattle, WA USA
[7] Univ Pisa, Dept Philol Literature & Linguist, Pisa, Italy
[8] MIT, Dept Brain & Cognit Sci, 43 Vassar St, Cambridge, MA 02139 USA
关键词
Generalized event knowledge; World knowledge; Plausibility; Typicality; Artificial neural networks; Language models; Syntax; Semantics; EYE-MOVEMENTS; PREDICTION; INTEGRATION; VERBS; REPRESENTATION; PERCEPTION; VIOLATIONS; MEMORY;
D O I
10.1111/cogs.13386
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.
引用
收藏
页数:40
相关论文
共 50 条
  • [31] Exploring Large Language Models in a Limited Resource Scenario
    Panchbhai, Anand
    Pankanti, Smarana
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 147 - 152
  • [32] Biases in Large Language Models: Origins, Inventory, and Discussion
    Navigli, Roberto
    Conia, Simone
    Ross, Bjorn
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2023, 15 (02):
  • [33] TOMBENCH: Benchmarking Theory of Mind in Large Language Models
    Chen, Zhuang
    Wu, Jincenzi
    Zhou, Jinfeng
    Wen, Bosi
    Bi, Guanqun
    Jiang, Gongyao
    Cao, Yaru
    Hu, Mengting
    Lai, Yunghwei
    Xiong, Zexuan
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15959 - 15983
  • [34] Using Large Language Models for the Interpretation of Building Regulations
    Fuchs, Stefan
    Witbrock, Michael
    Dimyadi, Johannes
    Amor, Robert
    Journal of Engineering, Project, and Production Management, 2024, 14 (04)
  • [35] Scrutinizing the foundations: could large language models be solipsistic?
    Esanu, Andreea
    SYNTHESE, 2024, 203 (05)
  • [36] A review of large language models and autonomous agents in chemistry
    Ramos, Mayk Caldas
    Collison, Christopher J.
    White, Andrew D.
    CHEMICAL SCIENCE, 2025, 16 (06) : 2514 - 2572
  • [37] Automated Disentangled Sequential Recommendation with Large Language Models
    Wang, Xin
    Chen, Hong
    Pan, Zirui
    Zhou, Yuwei
    Guan, Chaoyu
    Sun, Lifeng
    Zhu, Wenwu
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [38] A Survey on Multimodal Large Language Models for Autonomous Driving
    Cui, Can
    Ma, Yunsheng
    Cao, Xu
    Ye, Wenqian
    Zhou, Yang
    Liang, Kaizhao
    Chen, Jintai
    Lu, Juanwu
    Yang, Zichong
    Liao, Kuei-Da
    Gao, Tianren
    Li, Erlong
    Tang, Kun
    Cao, Zhipeng
    Zhou, Tong
    Liu, Ao
    Yan, Xinrui
    Mei, Shuqi
    Cao, Jianguo
    Wang, Ziran
    Zheng, Chao
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 958 - 979
  • [39] Exploring Variability in Risk Taking With Large Language Models
    Bhatia, Sudeep
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2024, 153 (07) : 1838 - 1860
  • [40] Leveraging large language models for peptide antibiotic design
    Guan, Changge
    Fernandes, Fabiano C.
    Franco, Octavio L.
    de la Fuente-nunez, Cesar
    CELL REPORTS PHYSICAL SCIENCE, 2025, 6 (01):