Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

被引：8

作者：

Kelly, Stephen ^{[1
]}

Voegerl, Tatiana ^{[1
]}

Banzhaf, Wolfgang ^{[1
]}

Gondro, Cedric ^{[1
]}

机构：

[1] Michigan State Univ, BEACON Ctr Study Evolut Act, E Lansing, MI 48824 USA

来源：

GENETIC PROGRAMMING AND EVOLVABLE MACHINES | 2021年 / 22卷 / 04期

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Genetic programming; Reinforcement learning; Temporal memory; Multi-task; MODEL;

D O I：

10.1007/s10710-021-09418-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A fundamental aspect of intelligent agent behaviour is the ability to encode salient features of experience in memory and use these memories, in combination with current sensory information, to predict the best action for each situation such that long-term objectives are maximized. The world is highly dynamic, and behavioural agents must generalize across a variety of environments and objectives over time. This scenario can be modeled as a partially-observable multi-task reinforcement learning problem. We use genetic programming to evolve highly-generalized agents capable of operating in six unique environments from the control literature, including OpenAI's entire Classic Control suite. This requires the agent to support discrete and continuous actions simultaneously. No task-identification sensor inputs are provided, thus agents must identify tasks from the dynamics of state variables alone and define control policies for each task. We show that emergent hierarchical structure in the evolving programs leads to multi-task agents that succeed by performing a temporal decomposition and encoding of the problem environments in memory. The resulting agents are competitive with task-specific agents in all six environments. Furthermore, the hierarchical structure of programs allows for dynamic run-time complexity, which results in relatively efficient operation.

引用

页码：573 / 605

页数：33

共 51 条

[1]

Agapitos A., 2012, FINANCIAL DECISION M, P159

[2]

[Anonymous], 2016, CoRR

[3]

[Anonymous], 2016, OpenAI Gym

[4]

Banino A., 2020, INT C LEARN REPR

[5] The Architecture of complexity [J].

Barabasi, Albert-Lashlo .

IEEE CONTROL SYSTEMS MAGAZINE, 2007, 27 (04) :33-42

[6]

Barreto A.M., 2009, P 11 ANN C GEN EV CO, P1767, DOI [10.1145/1569901.1570150, DOI 10.1145/1569901.1570150]

[7]

Brameier M., 2007, Linear genetic programming, V1

[8]

DEramo C., 2020, INT C LEARN REPR

[9]

Desnos Karol, 2021, DASIP '21: Proceedings of the 2021 Workshop on Design and Architectures for Signal and Image Processing (14th edition), P35, DOI 10.1145/3441110.3441575

[10]

Fernando Chrisantha, 2017, CoRR

← 1 2 3 4 5 6 →