Understanding Social Reasoning in Language Models with Language Models

被引:0
|
作者
Gandhi, Kanishk [1 ]
Franken, J. -Philipp [1 ]
Gerstenberg, Tobias [1 ]
Goodman, Noah D. [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios
    Li, Jiaxuan
    Yu, Lang
    Ettinger, Allyson
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 804 - 815
  • [2] Understanding models understanding language
    Sogaard, Anders
    SYNTHESE, 2022, 200 (06)
  • [3] Understanding models understanding language
    Anders Søgaard
    Synthese, 200
  • [4] Towards Understanding and Mitigating Social Biases in Language Models
    Liang, Paul Pu
    Wu, Chiyu
    Morency, Louis-Philippe
    Salakhutdinov, Ruslan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] The Journey of Language Models in Understanding Natural Language
    Liu, Yuanrui
    Zhou, Jingping
    Sang, Guobiao
    Huang, Ruilong
    Zhao, Xinzhe
    Fang, Jintao
    Wang, Tiexin
    Li, Bohan
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 331 - 363
  • [6] The Importance of Understanding Language in Large Language Models
    Youssef, Alaa
    Stein, Samantha
    Clapp, Justin
    Magnus, David
    AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 6 - 7
  • [7] CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models
    Wang, Xingbo
    Huang, Renfei
    Jin, Zhihua
    Fang, Tianqing
    Qu, Huamin
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 273 - 283
  • [8] ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
    Zhou, Kaiwen
    Lee, Kwonjoon
    Misu, Teruhisa
    Wang, Xin Eric
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10783 - 10795
  • [9] Large Language Models Are Reasoning Teachers
    Ho, Namgyu
    Schmid, Laura
    Yun, Se-Young
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14852 - 14882
  • [10] WARPED LANGUAGE MODELS FOR NOISE ROBUST LANGUAGE UNDERSTANDING
    Namazifar, Mahdi
    Tur, Gokhan
    Hakkani-Tur, Dilek
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 981 - 988