A Survey of Multi-Objective Sequential Decision-Making

被引：357

作者：

Roijers, Diederik M. ^{[1
]}

Vamplew, Peter ^{[2
]}

Whiteson, Shimon ^{[1
]}

Dazeley, Richard ^{[2
]}

机构：

[1] Univ Amsterdam, Inst Informat, Amsterdam, Netherlands

[2] Univ Ballarat, Sch Sci Informat Technol & Engn, Ballarat, Vic 3353, Australia

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2013年 / 48卷

关键词：

MANY-OBJECTIVE OPTIMIZATION; OBSERVABLE MARKOV-PROCESSES; MULTI-POLICY OPTIMIZATION; INFINITE-HORIZON; REINFORCEMENT; ITERATION; UNCERTAINTY; ALGORITHM; NETWORKS; MODELS;

D O I：

10.1613/jair.3987

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

引用

页码：67 / 113

页数：47

共 144 条

[81]

Liu W, 2010, DES AUT TEST EUROPE, P602

[82]

Lizotte DJ, 2012, J MACH LEARN RES, V13, P3253

[83]

LIZOTTE DJ, 2010, 27 INT C MACH LEARN, P695

[84]

Madani O, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P541

[85]

Mannor S, 2004, J MACH LEARN RES, V5, P325

[86] A reinforcement learning method for dynamic obstacle avoidance in robotic mechanisms [J].

Maravall, D ;

De Lope, J .

COMPUTATIONAL INTELLIGENT SYSTEMS FOR APPLIED RESEARCH, 2002, :485-494

[87]

MARIANO C, 2000, ADV ART INT INT JOIN

[88]

Mariano C, 2000, LECT NOTES ARTIF INT, P212

[89]

Mariano CE, 1999, GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, P894

[90]

MEISNER EM, 2009, THESIS RENSSELAER PO

← 4 5 6 7 8 9 10 11 12 13 →