Reinforcement learning algorithm for non-stationary environments

被引:63
作者
Padakandla, Sindhu [1 ]
Prabuchandran, K. J. [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore, Karnataka, India
关键词
Markov decision processes; Reinforcement learning; Non-Stationary environments; Change detection; CLASSIFIERS;
D O I
10.1007/s10489-020-01758-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem.
引用
收藏
页码:3590 / 3606
页数:17
相关论文
共 46 条
  • [1] Abdallah S, 2016, J MACH LEARN RES, V17
  • [2] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [3] Learning dexterous in-hand manipulation
    Andrychowicz, Marcin
    Baker, Bowen
    Chociej, Maciek
    Jozefowicz, Rafal
    McGrew, Bob
    Pachocki, Jakub
    Petron, Arthur
    Plappert, Matthias
    Powell, Glenn
    Ray, Alex
    Schneider, Jonas
    Sidor, Szymon
    Tobin, Josh
    Welinder, Peter
    Weng, Lilian
    Zaremba, Wojciech
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) : 3 - 20
  • [4] [Anonymous], 2000, SEQUENCE LEARNING
  • [5] Banerjee T, 2017, P AMER CONTR CONF, P399, DOI 10.23919/ACC.2017.7962986
  • [6] Bertsekas D. P., 2013, DYNAMIC PROGRAMMING, VII
  • [7] Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams
    Cano, Alberto
    Krawczyk, Bartosz
    [J]. PATTERN RECOGNITION, 2019, 87 : 248 - 268
  • [8] Choi SPM, 2000, ADV NEUR IN, V12, P987
  • [9] Csáji BC, 2008, J MACH LEARN RES, V9, P1679
  • [10] Da Silva B. C., 2006, ACM INT C PROCEEDING, P217, DOI DOI 10.1145/1143844.1143872