Social Value Alignment in Large Language Models

被引:0
作者
Abbol, Giulio Antonio [1 ]
Marchesi, Serena [2 ]
Wykowska, Agnieszka [2 ]
Belpaeme, Tony [1 ]
机构
[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium
[2] S4HRI Ist Italiano Tecnol, Genoa, Italy
来源
VALUE ENGINEERING IN ARTIFICIAL INTELLIGENCE, VALE 2023 | 2024年 / 14520卷
关键词
Values; Large Language Models; LLM; Alignment; MIND;
D O I
10.1007/978-3-031-58202-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 23 条
  • [1] Abbo G.A, 2023, P 1 WORKSH PERSP MOR, DOI DOI 10.5281/ZENODO.8123742
  • [2] Ahn M, 2022, Arxiv, DOI arXiv:2204.01691
  • [3] The Moral Machine experiment
    Awad, Edmond
    Dsouza, Sohan
    Kim, Richard
    Schulz, Jonathan
    Henrich, Joseph
    Shariff, Azim
    Bonnefon, Jean-Francois
    Rahwan, Iyad
    [J]. NATURE, 2018, 563 (7729) : 59 - +
  • [4] Brunet-Gouet E, 2023, Do conversational agents have a theory of mind? a single case study of chatgpt with the hinting, false beliefs and false photographs, and strange stories paradigms, DOI [10.5281/zenodo.7637476, DOI 10.5281/ZENODO.7637476]
  • [5] Butlin P, 2023, Arxiv, DOI [arXiv:2308.08708, DOI 10.48550/ARXIV.2308.08708, 10.48550/ARXIV.2308.08708]
  • [6] Dehaene S, 2011, RES PER NEUROSCI, P55, DOI 10.1007/978-3-642-18015-6_4
  • [7] Baby steps in evaluating the capacities of large language models
    Frank, Michael C.
    [J]. NATURE REVIEWS PSYCHOLOGY, 2023, 2 (08): : 451 - 452
  • [8] "Cool glasses, where did you get them?" Generating Visually Grounded Conversation Starters for Human-Robot Dialogue
    Janssens, Ruben
    Wolfert, Pieter
    Demeester, Thomas
    Belpaeme, Tony
    [J]. PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, : 821 - 825
  • [9] Jiang L., 2021, arXiv, DOI DOI 10.48550/ARXIV.2110.07574
  • [10] Theory of mind and moral cognition: exploring the connections
    Knobe, J
    [J]. TRENDS IN COGNITIVE SCIENCES, 2005, 9 (08) : 357 - 359