Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

被引:17
|
作者
Kauf, Carina [1 ,2 ,8 ]
Ivanova, Anna A. [1 ,2 ,3 ]
Rambelli, Giulia [4 ]
Chersoni, Emmanuele [5 ]
She, Jingyuan Selena [1 ,2 ]
Chowdhury, Zawad [6 ]
Fedorenko, Evelina [1 ,2 ]
Lenci, Alessandro [7 ]
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA USA
[2] MIT, McGovern Inst Brain Res, Cambridge, MA USA
[3] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA USA
[4] Univ Bologna, Dept Modern Languages Literatures & Cultures, Bologna, Italy
[5] Hong Kong Polytech Univ, Dept Chinese & Bilingual Studies, Hong Kong, Peoples R China
[6] Univ Washington, Dept Math, Seattle, WA USA
[7] Univ Pisa, Dept Philol Literature & Linguist, Pisa, Italy
[8] MIT, Dept Brain & Cognit Sci, 43 Vassar St, Cambridge, MA 02139 USA
关键词
Generalized event knowledge; World knowledge; Plausibility; Typicality; Artificial neural networks; Language models; Syntax; Semantics; EYE-MOVEMENTS; PREDICTION; INTEGRATION; VERBS; REPRESENTATION; PERCEPTION; VIOLATIONS; MEMORY;
D O I
10.1111/cogs.13386
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.
引用
收藏
页数:40
相关论文
共 50 条
  • [41] PharmaBench: Enhancing ADMET benchmarks with large language models
    Niu, Zhangming
    Xiao, Xianglu
    Wu, Wenfan
    Cai, Qiwei
    Jiang, Yinghui
    Jin, Wangzhen
    Wang, Minhao
    Yang, Guojian
    Kong, Lingkang
    Jin, Xurui
    Yang, Guang
    Chen, Hongming
    SCIENTIFIC DATA, 2024, 11 (01)
  • [42] Sparse Mixture of Experts Language Models Excel in Knowledge Distillation
    Xu, Haiyang
    Liu, Haoxiang
    Gong, Wei
    Wang, Hai
    Deng, Xianjun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 80 - 91
  • [43] Leveraging Knowledge and Reinforcement Learning for Enhanced Reliability of Language Models
    Tyagi, Nancy
    Sarkar, Surjodeep
    Gaur, Manas
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4320 - 4324
  • [44] Hemispheric asymmetry in event knowledge activation during incremental language comprehension: A visual half-field ERP study
    Metusalem, Ross
    Kutas, Marta
    Urbach, Thomas P.
    Elman, Jeffrey L.
    NEUROPSYCHOLOGIA, 2016, 84 : 252 - 271
  • [45] Unveiling the potential of large language models in generating semantic and cross-language clones
    Roy, Palash R.
    Alam, Ajmain I.
    Al-omari, Farouq
    Roy, Banani
    Roy, Chanchal K.
    Schneider, Kevin A.
    2023 IEEE 17TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES, IWSC 2023, 2023, : 22 - 28
  • [46] Learning Effective Event Models to Recognize a Large Number of Human Actions
    Wu, Jianzhai
    Hu, Dewen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (01) : 147 - 158
  • [47] Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
    Shah, Dhruv
    Equi, Michael
    Osinski, Blazej
    Xia, Fei
    Ichter, Brian
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [48] Potential of Large Language Models in Health Care: Delphi Study
    Denecke, Kerstin
    May, Richard
    Romero, Octavio Rivera
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [49] When Protein Structure Embedding Meets Large Language Models
    Ali, Sarwan
    Chourasia, Prakash
    Patterson, Murray
    GENES, 2024, 15 (01)
  • [50] Significant Productivity Gains through Programming with Large Language Models
    Weber T.
    Brandmaier M.
    Schmidt A.
    Mayer S.
    Proceedings of the ACM on Human-Computer Interaction, 2024, 8 (EICS)