Social Value Alignment in Large Language Models

被引：0

作者：

Abbol, Giulio Antonio ^{[1
]}

Marchesi, Serena ^{[2
]}

Wykowska, Agnieszka ^{[2
]}

Belpaeme, Tony ^{[1
]}

机构：

[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium

[2] S4HRI Ist Italiano Tecnol, Genoa, Italy

来源：

VALUE ENGINEERING IN ARTIFICIAL INTELLIGENCE, VALE 2023 | 2024年 / 14520卷

关键词：

Values; Large Language Models; LLM; Alignment; MIND;

D O I：

10.1007/978-3-031-58202-8_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.

引用

页码：83 / 97

页数：15

共 21 条

[1]

Ahn M, 2022, Arxiv, DOI arXiv:2204.01691

[2] The Moral Machine experiment [J].

Awad, Edmond ;

Dsouza, Sohan ;

Kim, Richard ;

Schulz, Jonathan ;

Henrich, Joseph ;

Shariff, Azim ;

Bonnefon, Jean-Francois ;

Rahwan, Iyad .

NATURE, 2018, 563 (7729) :59-+

[3]

Brunet-Gouet E, 2023, False Beliefs False Photogr, Strange Stories Paradig

[4]

Butlin P, 2023, Arxiv, DOI [arXiv:2308.08708, 10.48550/arXiv.2308.08708, DOI 10.48550/ARXIV.2308.08708]

[5]

Dehaene S, 2011, RES PER NEUROSCI, P55, DOI 10.1007/978-3-642-18015-6_4

[6] Baby steps in evaluating the capacities of large language models [J].

Frank, Michael C. .

NATURE REVIEWS PSYCHOLOGY, 2023, 2 (08) :451-452

[7] "Cool glasses, where did you get them?" Generating Visually Grounded Conversation Starters for Human-Robot Dialogue [J].

Janssens, Ruben ;

Wolfert, Pieter ;

Demeester, Thomas ;

Belpaeme, Tony .

PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, :821-825

[8]

Jiang L., 2021, arXiv

[9] Theory of mind and moral cognition: exploring the connections [J].

Knobe, J .

TRENDS IN COGNITIVE SCIENCES, 2005, 9 (08) :357-359

[10]

Kosinski M, 2024, Arxiv, DOI [arXiv:2302.02083, 10.48550/ARXIV.2302.02083]

← 1 2 3 →