Operational Safe Control for Reinforcement-Learning-Based Robot Autonomy

被引：0

作者：

Zhou, Xu ^{[1
]}

机构：

[1] Changshu Inst Technol, Suzhou 215500, Peoples R China

来源：

2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC) | 2021年

关键词：

Operational safe control; reinforcement learning; robot autonomy;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) has been widely used for robot autonomy because it can adapt to dynamic or unknown environments by automatically learning optimal control policies from the interactions between robots and environments. However, the practical deployment of RL can endanger the safety of both robots and environments because many RL methods must experience failures during the training phase. These failures can be reduced or avoided by assuming knowing prior knowledge about the states and environments in the training phase, but this assumption is easily invalid in practical applications, especially with unknown environments. In addition, restarting a training episode could be difficult in practice because the robot may be stuck in the failures. To solve these problems, we propose an operational safe control framework that can automatically recover from failures and reduce failure risks without any prior knowledge. Our framework consists of three steps: (1) detect failures and revert to safe actions, (2) collect correction samples to learn a potential that provides internal environment information to robots, (3) use the potential to shape a safe reward that biases safe explorations. A maze navigation example is used to demonstrate that our method outperforms the traditional reinforcement learning with significantly less failures.

引用

页码：4091 / 4095

页数：5

共 21 条

[1] Autonomous Helicopter Aerobatics through Apprenticeship Learning [J].

Abbeel, Pieter ;

Coates, Adam ;

Ng, Andrew Y. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) :1608-1639

[2]

Brys T, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3352

[3]

Costa E. D. S., 2010, 2010 Ninth International Conference on Machine Learning and Applications (ICMLA 2010), P37, DOI 10.1109/ICMLA.2010.13

[4] Integrating guidance into relational reinforcement learning [J].

Driessens, K ;

Dzeroski, S .

MACHINE LEARNING, 2004, 57 (03) :271-304

[5]

Farooq Umar, 2010, Proceedings of the 2nd International Conference on Machine Learning and Computing (ICMLC 2010), P96, DOI 10.1109/ICMLC.2010.71

[6]

García J, 2015, J MACH LEARN RES, V16, P1437

[7] Risk-sensitive reinforcement learning applied to control under constraints [J].

Geibel, P ;

Wysotzki, F .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2005, 24 :81-108

[8]

GULLAPALLI V, 1992, PROCEEDINGS OF THE 1992 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, P554, DOI 10.1109/ISIC.1992.225046

[9]

Kadota Y, 2006, COMPUT MATH APPL, V51, P279, DOI [10.1016/j.camwa.2005.11.013, 10.1016/0898-1221(05)00465-7]

[10] Risk-sensitive learning via minimization of empirical conditional value-at-risk [J].

Kashima, Hisashi .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (12) :2043-2052

← 1 2 3 →