From Preference-Based to Multiobjective Sequential Decision-Making

被引:0
作者
Weng, Paul [1 ,2 ]
机构
[1] SYSU CMU Joint Inst Engn, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
[2] SYSU CMU Shunde Joint Res Inst, Shunde 528300, Peoples R China
来源
MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, (MIWAI 2016) | 2016年 / 10053卷
关键词
Sequential decision-making; Preference-based reinforcement learning; Multiobjective markov decision process; Multiobjective Reinforcement Learning;
D O I
10.1007/978-3-319-49397-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a link between preference-based and multiobjective sequential decision-making. While transforming a multiobjective problem to a preference-based one is quite natural, the other direction is a bit less obvious. We present how this transformation (from preference-based to multiobjective) can be done under the classic condition that preferences over histories can be represented by additively decomposable utilities and that the decision criterion to evaluate policies in a state is based on expectation. This link yields a new source of multiobjective sequential decision-making problems (i.e., when reward values are unknown) and justifies the use of solving methods developed in one setting in the other one.
引用
收藏
页码:231 / 242
页数:12
相关论文
共 35 条
  • [1] Autonomous Helicopter Aerobatics through Apprenticeship Learning
    Abbeel, Pieter
    Coates, Adam
    Ng, Andrew Y.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
  • [2] Akrour Riad, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P116, DOI 10.1007/978-3-642-33486-3_8
  • [3] [Anonymous], 2011, P INT C AUTOMATED PL
  • [4] [Anonymous], COLT
  • [5] Barrett Leon., 2008, ICML
  • [6] Busa-Fekete R., 2013, INT C MARCH LEARN IC
  • [7] Busa-Fekete R., 2013, EUR WORKSH REINF LEA
  • [8] Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
    Busa-Fekete, Robert
    Szoerenyi, Balazs
    Weng, Paul
    Cheng, Weiwei
    Huellermeier, Eyke
    [J]. MACHINE LEARNING, 2014, 97 (03) : 327 - 351
  • [9] Chatterjee K, 2006, LECT NOTES COMPUT SC, V3884, P325
  • [10] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
    Fuernkranz, Johannes
    Huellermeier, Eyke
    Cheng, Weiwei
    Park, Sang-Hyeun
    [J]. MACHINE LEARNING, 2012, 89 (1-2) : 123 - 156