Understanding Social Reasoning in Language Models with Language Models

被引：0

作者：

Gandhi, Kanishk ^{[1
]}

Franken, J. -Philipp ^{[1
]}

Gerstenberg, Tobias ^{[1
]}

Goodman, Noah D. ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.

引用

页数：12

共 50 条

[1] Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios
Li, Jiaxuan
Yu, Lang
Ettinger, Allyson
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 804 - 815
[2] Understanding models understanding language
Sogaard, Anders
SYNTHESE, 2022, 200 (06)
[3] Understanding models understanding language
Anders Søgaard
Synthese, 200
[4] Towards Understanding and Mitigating Social Biases in Language Models
Liang, Paul Pu
Wu, Chiyu
Morency, Louis-Philippe
Salakhutdinov, Ruslan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] The Journey of Language Models in Understanding Natural Language
Liu, Yuanrui
Zhou, Jingping
Sang, Guobiao
Huang, Ruilong
Zhao, Xinzhe
Fang, Jintao
Wang, Tiexin
Li, Bohan
WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 331 - 363
[6] The Importance of Understanding Language in Large Language Models
Youssef, Alaa
Stein, Samantha
Clapp, Justin
Magnus, David
AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 6 - 7
[7] CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models
Wang, Xingbo
Huang, Renfei
Jin, Zhihua
Fang, Tianqing
Qu, Huamin
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 273 - 283
[8] ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Zhou, Kaiwen
Lee, Kwonjoon
Misu, Teruhisa
Wang, Xin Eric
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10783 - 10795
[9] Large Language Models Are Reasoning Teachers
Ho, Namgyu
Schmid, Laura
Yun, Se-Young
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14852 - 14882
[10] WARPED LANGUAGE MODELS FOR NOISE ROBUST LANGUAGE UNDERSTANDING
Namazifar, Mahdi
Tur, Gokhan
Hakkani-Tur, Dilek
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 981 - 988

← 1 2 3 4 5 →