From Preference-Based to Multiobjective Sequential Decision-Making

被引：0

作者：

Weng, Paul ^{[1
,2
]}

机构：

[1] SYSU CMU Joint Inst Engn, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China

[2] SYSU CMU Shunde Joint Res Inst, Shunde 528300, Peoples R China

来源：

MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, (MIWAI 2016) | 2016年 / 10053卷

关键词：

Sequential decision-making; Preference-based reinforcement learning; Multiobjective markov decision process; Multiobjective Reinforcement Learning;

D O I：

10.1007/978-3-319-49397-8_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a link between preference-based and multiobjective sequential decision-making. While transforming a multiobjective problem to a preference-based one is quite natural, the other direction is a bit less obvious. We present how this transformation (from preference-based to multiobjective) can be done under the classic condition that preferences over histories can be represented by additively decomposable utilities and that the decision criterion to evaluate policies in a state is based on expectation. This link yields a new source of multiobjective sequential decision-making problems (i.e., when reward values are unknown) and justifies the use of solving methods developed in one setting in the other one.

引用

页码：231 / 242

页数：12

共 35 条

[1] Autonomous Helicopter Aerobatics through Apprenticeship Learning
Abbeel, Pieter
Coates, Adam
Ng, Andrew Y.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
[2] Akrour Riad, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P116, DOI 10.1007/978-3-642-33486-3_8
[3] [Anonymous], 2011, P INT C AUTOMATED PL
[4] [Anonymous], COLT
[5] Barrett Leon., 2008, ICML
[6] Busa-Fekete R., 2013, INT C MARCH LEARN IC
[7] Busa-Fekete R., 2013, EUR WORKSH REINF LEA
[8] Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
Busa-Fekete, Robert
Szoerenyi, Balazs
Weng, Paul
Cheng, Weiwei
Huellermeier, Eyke
[J]. MACHINE LEARNING, 2014, 97 (03) : 327 - 351
[9] Chatterjee K, 2006, LECT NOTES COMPUT SC, V3884, P325
[10] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
Fuernkranz, Johannes
Huellermeier, Eyke
Cheng, Weiwei
Park, Sang-Hyeun
[J]. MACHINE LEARNING, 2012, 89 (1-2) : 123 - 156

← 1 2 3 4 →