Empirical evaluation methods for multiobjective reinforcement learning algorithms

被引:150
作者
Vamplew, Peter [1 ]
Dazeley, Richard [1 ]
Berry, Adam [2 ]
Issabekov, Rustam [1 ]
Dekker, Evan [1 ]
机构
[1] Univ Ballarat, Grad Sch Informat Technol & Math Sci, Ballarat, Vic 3353, Australia
[2] CSIRO Energy Ctr, Mayfield W, NSW 2304, Australia
关键词
Multiobjective reinforcement learning; Multiple objectives; Empirical methods; Pareto fronts; Pareto optimal policies;
D O I
10.1007/s10994-010-5232-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms.
引用
收藏
页码:51 / 80
页数:30
相关论文
共 35 条
  • [21] Natarajan S., 2005, P 22 INT C MACHINE L, P601
  • [22] Pareto, 1896, MANUEL EC POLITIQUE
  • [23] PEREZ J, 2009, INT C AUT COMP BARC, P55
  • [24] SHELTON CR, 2001, 2001003 MIT AI LAB
  • [25] Sutton RS, 1996, ADV NEUR IN, V8, P1038
  • [26] Tanner B, 2009, J MACH LEARN RES, V10, P2133
  • [27] TESAURO G, 2007, NEURAL INFORM PROCES
  • [28] *UMASS, 2010, U MASS REINF LEARN R
  • [29] VAMPLEW P, 2009, LECT NOTES ARTIFICIA
  • [30] Vamplew P, 2008, LECT NOTES ARTIF INT, V5360, P372, DOI 10.1007/978-3-540-89378-3_37