AI Alignment and Human Reward

被引：7

作者：

Butlin, Patrick ^{[1
]}

机构：

[1] Kings Coll London, Dept Philosophy, London, England

来源：

AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年

关键词：

Value alignment; reward functions; human values; value learning; INTRINSIC MOTIVATION; SYSTEMS; HABITS;

D O I：

10.1145/3461702.3462570

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One research project to which this approach gives rise is therefore to say which of these various classes of human values should be promoted. This paper takes on part of this project by assessing the proposal that human reward functions should be the target for AI alignment. There is some reason to believe that powerful AI agents which were aligned to values of this form would help us to lead good lives, but there is also considerable uncertainty about this claim, arising from unresolved empirical and conceptual issues in human psychology.

引用

页码：437 / 445

页数：9

共 53 条

[1]

[Anonymous], 2013, Intrinsically motivated learning in natural and artificial systems

[2]

[Anonymous], 1980, Anarchy State and Utopia

[3]

[Anonymous], 2014, Superintelligence: Paths, Dangers, Strategies, DOI DOI 10.1080/01402390.2013.844127

[4]

Berridge K. C., 2013, NEUROECONOMICS DECIS, P335

[5] DECISION UTILITY, THE BRAIN, AND PURSUIT OF HEDONIC GOALS [J].

Berridge, Kent C. ;

Aldridge, J. Wayne .

SOCIAL COGNITION, 2008, 26 (05) :621-646

[6] Pleasure Systems in the Brain [J].

Berridge, Kent C. ;

Kringelbach, Morten L. .

NEURON, 2015, 86 (03) :646-664

[7] Why Hunger is not a Desire [J].

Butlin P. .

Review of Philosophy and Psychology, 2017, 8 (3) :617-635

[8] The under-appreciated drive for sense-making [J].

Chater, Nick ;

Loewenstein, George .

JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2016, 126 :137-154

[9]

Christiano P., 2015, The easy goal inference problem is still hard

[10]

Crisp Roger, 2017, Wellbeing. The Stanford Encyclopedia of Philosophy

← 1 2 3 4 5 6 →