rocorl: Transferable Reinforcement Learning-Based Robust Control for Cyber-Physical Systems With Limited Data Updates

被引：2

作者：

Yoo, Gwangpyo ^{[1
]}

Yoo, Minjong ^{[1
]}

Yeom, Ikjun ^{[1
]}

Woo, Honguk ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Comp Sci & Engn, Suwon 16419, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

新加坡国家研究基金会;

关键词：

Real-time systems; Data models; Robot sensing systems; Reinforcement learning; Training; Sensors; Cyber-physical systems; Cyber-physical system; real-time data; reinforcement learning; model-based learning; stale observations; MDPS;

D O I：

10.1109/ACCESS.2020.3044945

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous control systems are increasingly using machine learning technologies to process sensor data, making timely and informed decisions about performing control functions based on the data processing results. Among such machine learning technologies, reinforcement learning (RL) with deep neural networks has been recently recognized as one of the feasible solutions, since it enables learning by interaction with environments of control systems. In this paper, we consider RL-based control models and address the problem of temporally outdated observations often incurred in dynamic cyber-physical environments. The problem can hinder broad adoptions of RL methods for autonomous control systems. Specifically, we present an RL-based robust control model, namely rocorl, that exploits a hierarchical learning structure in which a set of low-level policy variants are trained for stale observations and then their learned knowledge can be transferred to a target environment limited in timely data updates. In doing so, we employ an autoencoder-based observation transfer scheme for systematically training a set of transferable control policies and an aggregated model-based learning scheme for data-efficiently training a high-level orchestrator in a hierarchy. Our experiments show that rocorl is robust against various conditions of distributed sensor data updates, compared with several other models including a state-of-the-art POMDP method.

引用

页码：225370 / 225383

页数：14

共 43 条

[1]

[Anonymous], 2016, CORR

[2]

[Anonymous], 2018, ARXIV180602426

[3]

Arjovsky M, 2017, PR MACH LEARN RES, V70

[4]

Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726

[5]

Bagnell JA, 2001, IEEE INT CONF ROBOT, P1615, DOI 10.1109/ROBOT.2001.932842

[6] Learning a similarity metric discriminatively, with application to face verification [J].

Chopra, S ;

Hadsell, R ;

LeCun, Y .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :539-546

[7]

Deisenroth M, 2011, P 28 INT C MACH LEAR, P465, DOI DOI 10.5555/3104482.3104541

[8] Guarantee the Quality-of-Service of Control Transactions in Real-Time Database Systems [J].

Deng, Chenggang ;

Li, Guohui ;

Zhou, Quan ;

Li, Jianjun .

IEEE ACCESS, 2020, 8 :110511-110522

[9] Using evolution strategies to solve DEC-POMDP problems [J].

Eker, Baris ;

Akin, H. Levent .

SOFT COMPUTING, 2010, 14 (01) :35-47

[10] Transfer learning with Partially Constrained Models: Application to reinforcement learning of linked multicomponent robot system control [J].

Fernandez-Gauna, Borja ;

Manuel Lopez-Guede, Jose ;

Grana, Manuel .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (07) :694-703

← 1 2 3 4 5 →