Training a model-free reinforcement learning controller for a 3-degree-of-freedom helicopter under multiple constraints

被引：7

作者：

Xue, Shengri ^{[1
]}

Li, Zhan ^{[1
,2
]}

Yang, Liu ^{[2
,3
]}

机构：

[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Heilongjiang, Peoples R China

[2] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin, Heilongjiang, Peoples R China

[3] Harbin Univ Sci & Technol, Coll Automat, Harbin, Heilongjiang, Peoples R China

来源：

MEASUREMENT & CONTROL | 2019年 / 52卷 / 7-8期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Helicopter; reinforcement learning; reward function module; policy gradient; actor-critic-based controllers; OUTPUT-FEEDBACK CONTROL; DISCRETE-TIME-SYSTEMS; H-INFINITY CONTROL; TRACKING CONTROL; LABORATORY HELICOPTER; 3-DOF HELICOPTER; COMMAND;

D O I：

10.1177/0020294019847711

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The purpose of the article is to design data-driven attitude controllers for a 3-degree-of-freedom experimental helicopter under multiple constraints. Controllers were updated by utilizing the reinforcement learning technique. The 3-degree-of-freedom helicopter platform is an approximation to a practical helicopter attitude control system, which includes realistic features such as complicated dynamics, coupling and uncertainties. The method in this paper first describes the training environment, which consists of user-defined constraints and performance expectations by using a reward function module. Then, actor-critic-based controllers were designed for helicopter elevation and pitch axis. Next, the policy gradient method, which is an important branch of the reinforcement learning algorithms, is utilized to train the networks and optimize controllers. Finally, from experimental results acquired by the 3-degree-of-freedom helicopter platform, the advantages of the proposed method are illustrated by satisfying multiple control constraints.

引用

页码：844 / 854

页数：11

共 37 条

[1]

[Anonymous], P ICIL PRET 22 23 FE

[2]

[Anonymous], P ICLR SAN DIEG CA 7

[3] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[4] Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming [J].

Fu, Jian ;

He, Haibo ;

Zhou, Xinmin .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (07) :1133-1148

[5] Structured H∞ command and control-loop design for unmanned helicopters [J].

Gadewadikar, J. ;

Lewis, F. L. ;

Subbarao, Kamesh ;

Chen, Ben M. .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2008, 31 (04) :1093-1102

[6] Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems [J].

Gao, Weinan ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4164-4169

[7]

Guo XX, 2014, ADV NEUR IN, V27

[8]

Kakade S., 2002, INT C MACH LEARN

[9] Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking [J].

Kamalapurkar, Rushikesh ;

Andrews, Lindsey ;

Walters, Patrick ;

Dixon, Warren E. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :753-758

[10] H∞ control of linear discrete-time systems: Off-policy reinforcement learning [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Jiang, Zhong-Ping .

AUTOMATICA, 2017, 78 :144-152

← 1 2 3 4 →