Multi-objectivization and ensembles of shapings in reinforcement learning

被引:30
作者
Brys, Tim [1 ]
Harutyunyan, Anna [1 ,2 ]
Vrancx, Peter [1 ,2 ]
Nowe, Ann [1 ,2 ]
Taylor, Matthew E. [3 ]
机构
[1] Vrije Univ Brussel, AI Lab, B-1050 Brussels, Belgium
[2] Vrije Univ Brussel, B-1050 Brussels, Belgium
[3] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; Multi-objectivization; Ensemble techniques; Reward shaping;
D O I
10.1016/j.neucom.2017.02.096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble techniques are a powerful approach to creating better decision makers in machine learning. Multiple decision makers are trained to solve a given task, grouped in an ensemble, and their decisions are aggregated. The ensemble derives its power from the diversity of its components, as the assumption is that they make mistakes on different inputs, and that the majority is more likely to be correct than any individual component. Diversity usually comes from the different algorithms employed by the decision makers, or the different inputs used to train the decision makers. We advocate a third way to achieve this diversity, called diversity of evaluation, using the principle of multi-objectivization. This is the process of taking a single-objective problem and transforming it into a multi-objective problem in order to solve the original problem faster and/or better. This is either done through decomposition of the original objective, or the addition of extra objectives, typically based on some (heuristic) domain knowledge. This process basically creates a diverse set of feedback signals for what is underneath still a single-objective problem. In the context of ensemble techniques, these various ways to evaluate a (solution to a) problem allow different components of the ensemble to look at the problem in different ways, generating the necessary diversity for the ensemble. In this paper, we argue for the combination of multi-objectivization and ensemble techniques as a powerful tool to boost solving performance in reinforcement learning. We inject various pieces of heuristic information through reward shaping, creating several distinct enriched reward signals, which can strategically be combined using ensemble techniques to reduce sample complexity. We provide theoretical guarantees and demonstrate the potential of the approach with a range of experiments. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:48 / 59
页数:12
相关论文
共 32 条
[1]  
Albus J. S., 1981, BRAINS BEHAV ROBOTIC
[2]  
[Anonymous], 2011, P 10 INT C AUT AG MU
[3]  
[Anonymous], 1986, BCSG201028 BOEING AD
[4]  
[Anonymous], MACHINE LEARNING
[5]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[6]  
Brys T., 2014, P 28 AAAI C ART INT
[7]  
Brys T, 2014, IEEE IJCNN, P2315, DOI 10.1109/IJCNN.2014.6889732
[8]   Distributed learning and multi-objectivity in traffic light control [J].
Brys, Tim ;
Pham, Tong T. ;
Taylor, Matthew E. .
CONNECTION SCIENCE, 2014, 26 (01) :65-83
[9]   A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems [J].
Das, I ;
Dennis, JE .
STRUCTURAL OPTIMIZATION, 1997, 14 (01) :63-69
[10]  
DEVLIN S, 2011, 10 INT C AUT AG MULT, V1, P225