Probabilistic Policy Reuse for Safe Reinforcement Learning

被引:6
作者
Garcia, Javier [1 ]
Fernandez, Fernando [1 ]
机构
[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain
关键词
Reinforcement learning; case-based reasoning; software agents;
D O I
10.1145/3310090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
引用
收藏
页数:24
相关论文
共 33 条
  • [1] [Anonymous], IPROCEEDINGS 35 ANN
  • [2] [Anonymous], 2012, Proceedings of the Adaptive and Learning Agents workshop (at AAMAS-12)
  • [3] [Anonymous], DECISION SUPPORT SYS
  • [4] [Anonymous], 2011, P INT C AUTONOMOUS A
  • [5] [Anonymous], NEURAL NETWORKS
  • [6] [Anonymous], MACHINE LEARNING
  • [7] [Anonymous], THESIS
  • [8] [Anonymous], ADV NEURAL INFORM PR
  • [9] [Anonymous], 2003, NIPS 03
  • [10] [Anonymous], MACHINE LEARNING