META-GRADIENTS IN NON-STATIONARY ENVIRONMENTS

被引：0

作者：

Luketina, Jelena ^{[1
,2
]}

Flennerhag, Sebastian ^{[2
]}

Schroecker, Yannick ^{[2
]}

Abel, David ^{[2
]}

Zahavy, Tom ^{[2
]}

Singh, Satinder ^{[2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] DeepMind, London, England

来源：

CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199 | 2022年 / 199卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhag et al., 2022; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance. We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualising meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.

引用

页数：16

共 22 条

[1]

Almeida Diogo., 2021, arXiv, DOI DOI 10.48550/ARXIV.2106.00958

[2]

Andrychowicz M, 2016, ADV NEUR IN, V29

[3] Gradient-based optimization of hyperparameters [J].

Bengio, Y .

NEURAL COMPUTATION, 2000, 12 (08) :1889-1900

[4]

Erhan D, 2010, J MACH LEARN RES, V11, P625

[5]

FLENNERHAG S., 2020, INT C LEARNING REPRE

[6]

Kirsch L, 2021, ADV NEUR IN, V34

[7]

Kirsch Louis, 2020, Improving Generalization in Meta Reinforcement Learning Using Learned Objectives

[8]

Maclaurin D, 2015, PR MACH LEARN RES, V37, P2113

[9]

Mahmood AR, 2012, INT CONF ACOUST SPEE, P2121, DOI 10.1109/ICASSP.2012.6288330

[10]

Oh J., 2020, ADV NEUR IN, V33

← 1 2 3 →