Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with the SOCKET Benchmark

被引:0
作者
Choi, Minje [1 ]
Pei, Jiaxin [1 ]
Kumar, Sagar [2 ]
David, Shua [3 ]
Jurgens, Jurgen [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Northeastern Univ, Boston, MA USA
[3] Univ Cambridge, Cambridge, England
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年
基金
美国国家科学基金会;
关键词
PRAGMATICS; IMPOLITENESS; EMOTION; HUMOR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand social language. Here, we introduce a new theory-driven benchmark, SOCKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, trustworthiness, and other social factors. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The resources are released at https://github.com/minjechoi/SOCKET.
引用
收藏
页码:11370 / 11403
页数:34
相关论文
共 177 条
[1]  
Abu Farha I, 2022, PROCEEDINGS OF THE 16TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2022, P802
[2]   The Social Brain: Neural Basis of Social Knowledge [J].
Adolphs, Ralph .
ANNUAL REVIEW OF PSYCHOLOGY, 2009, 60 :693-716
[3]  
Aghajanyan A, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P5799
[4]  
Alberts JK, 1992, Constructing and re-constructing gender: The links among communica-tion, language, and gender, V10, P185
[5]  
Alexander Michael, 1995, Dis-course in society: Systemic functional perspectives, V50
[6]   The pragmatics of connotation [J].
Allan, Keith .
JOURNAL OF PRAGMATICS, 2007, 39 (06) :1047-1057
[7]   Whose Words Hurt? Contextual Determinants of Offensive Speech [J].
Almagro, Manuel ;
Hannikainen, Ivar R. ;
Villanueva, Neftali .
PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN, 2022, 48 (06) :937-953
[8]  
[Anonymous], NATURAL LANGUAGE PRO, P1696
[9]  
[Anonymous], 2019, CoRR, DOI DOI 10.48550/arXiv.1907.11692
[10]  
[Anonymous], 2016, Crowd Truth