A survey for deep reinforcement learning in markovian cyber-physical systems: Common problems and solutions

被引:19
作者
Rupprecht, Timothy [1 ]
Wang, Yanzhi [1 ]
机构
[1] Northeastern Univ, 360 Huntington Ave, Boston, MA 02445 USA
基金
美国国家科学基金会;
关键词
Deep reinforcement learning; Cyber-physical systems; Motor control; Resource allocation; HVAC; DEMAND;
D O I
10.1016/j.neunet.2022.05.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Reinforcement Learning (DRL) is increasingly applied in cyber-physical systems for automation tasks. It is important to record the developing trends in DRL's applications to help researchers overcome common problems using common solutions. This survey investigates trends seen within two applied settings: motor control tasks, and resource allocation tasks. The common problems include intractability of the action space, or state space, as well as hurdles associated with the prohibitive cost of training systems from scratch in the real-world. Real-world training data is sparse and difficult to derive and training in real-world can damage real-world learning systems. Researchers have provided a set of common as well as unique solutions. Tackling the problem of intractability, researchers have succeeded in guiding network training with handcrafted reward functions, auxiliary learning, and by simplifying the state or action spaces before performing transfer learning to more complex systems. Many state-of-the-art algorithms reformulate problems to use multi-agent or hierarchical learning to reduce the intractability of the state or action spaces for a single agent. Common solutions to the prohibitive cost of training include using benchmarks and simulations. This requires a shared feature space common to both simulation and the real world; without that you introduce what is known as the reality gap problem. This is the first survey, to our knowledge, that studies DRL as it is applied in the real world at this scope. It is our hope that the common solutions surveyed become common practice. (c) 2022 Elsevier Ltd. All rights reserved
引用
收藏
页码:13 / 36
页数:24
相关论文
共 128 条
  • [1] Agarwal Mridul, 2019, ARXIV LEARNING
  • [2] Al-Abbasi Abubakr O., 2019, ABS190303882 CORR
  • [3] Andrychowicz Marcin, 2017, ABS1707 01495 CORR
  • [4] [Anonymous], 2016, ABS161004286 CORR
  • [5] [Anonymous], 2017, ABS170309035 CORR
  • [6] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
  • [7] Barceló J, 2004, J INTELL ROBOT SYST, V41, P173
  • [8] Bellemare, 2017, ARXIV170706887 CSLG
  • [9] THE THEORY OF DYNAMIC PROGRAMMING
    BELLMAN, R
    [J]. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) : 503 - 515
  • [10] A neural network controller for continuum robots
    Braganza, David
    Dawson, Darren M.
    Walker, Ian D.
    Nath, Nitendra
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2007, 23 (06) : 1270 - 1277