Model-based deep reinforcement learning with heuristic search for satellite attitude control

被引：3

作者：

Xu, Ke ^{[1
]}

Wu, Fengge ^{[1
]}

Zhao, Junsuo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China

来源：

INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION | 2019年 / 46卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Control; Artificial Intelligence; Deep reinforcement learning; Satellite attitude; TRACKING CONTROL;

D O I：

10.1108/IR-05-2018-0086

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Purpose Recently, deep reinforcement learning is developing rapidly and shows its power to solve difficult problems such as robotics and game of GO. Meanwhile, satellite attitude control systems are still using classical control technics such as proportional - integral - derivative and slide mode control as major solutions, facing problems with adaptability and automation. Design/methodology/approach In this paper, an approach based on deep reinforcement learning is proposed to increase adaptability and autonomy of satellite control system. It is a model-based algorithm which could find solutions with fewer episodes of learning than model-free algorithms. Findings Simulation experiment shows that when classical control crashed, this approach could find solution and reach the target with hundreds times of explorations and learning. Originality/value This approach is a non-gradient method using heuristic search to optimize policy to avoid local optima. Compared with classical control technics, this approach does not need prior knowledge of satellite or its orbit, has the ability to adapt different kinds of situations with data learning and has the ability to adapt different kinds of satellite and different tasks through transfer learning.

引用

页码：415 / 420

页数：6

共 24 条

[1]

Abadi M., 2015, TensorFlow: Large-scale machine learning on heterogeneous systems

[2] Partial Lyapunov Strictification: Smooth Angular Velocity Observers for Attitude Tracking Control [J].

Akella, Maruthi R. ;

Thakur, Divya ;

Mazenc, Frederic .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2015, 38 (03) :442-451

[3]

[Anonymous], 2015, RXIV150902971

[4]

[Anonymous], 2015, J GUID CONTROL DYN

[5]

Brockman G., 2016, OPENAI GYM

[6]

Ghadiri H., 2015, INT C SPAC OP

[7]

Gross K., 2015, AIAA MOD SIM TECHN C

[8] Caffe: Convolutional Architecture for Fast Feature Embedding [J].

Jia, Yangqing ;

Shelhamer, Evan ;

Donahue, Jeff ;

Karayev, Sergey ;

Long, Jonathan ;

Girshick, Ross ;

Guadarrama, Sergio ;

Darrell, Trevor .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678

[9]

Kennedy J, 1995, 1995 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS PROCEEDINGS, VOLS 1-6, P1942, DOI 10.1109/icnn.1995.488968

[10]

Kirkpatrick S., 1987, Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications, P606

← 1 2 3 →