Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

被引:0
作者
Seungwon Choi
Suseong Kim
H. Jin Kim
机构
[1] Seoul National University,Department of Mechanical and Aerospace engineering
来源
International Journal of Control, Automation and Systems | 2017年 / 15卷
关键词
Inverse reinforcement learning; learning from demonstration; multirotor control; particle swarm optimization;
D O I
暂无
中图分类号
学科分类号
摘要
The main purpose of this paper is to learn the control performance of an expert by imitating the demonstrations of a multirotor UAV (unmanned aerial vehicle) operated by an expert pilot. First, we collect a set of several demonstrations by an expert for a certain task which we want to learn. We extract a representative trajectory from the dataset. Here, the representative trajectory includes a sequence of state and input. The trajectory is obtained using hidden Markov model (HMM) and dynamic time warping (DTW). In the next step, the multirotor learns to track the trajectory for imitation. Although we have data of feed-forward input for each time sequence, using this input directly can deteriorate the stability of the multirotor due to insufficient data for generalization and numerical issues. For that reason, a controller is needed which generates the input command for the suitable flight maneuver. To design such a controller, we learn the hidden reward function of a quadratic form from the demonstrated flights using inverse reinforcement learning. After we find the optimal reward function that minimizes the trajectory tracking error, we design a reinforcement learning based controller using this reward function. The simulation and experiment applied to a multirotor UAV show successful imitation results.
引用
收藏
页码:1826 / 1834
页数:8
相关论文
共 33 条
[1]  
Lee D.(2009)Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter International Journal of Control, Automation and Systems 7 419-428
[2]  
Jin Kim H.(2012)Optimization-based iterative learning for precise quadrocopter trajectory tracking Autonomous Robots 33 103-127
[3]  
Sastry S.(2012)Trajectory generation and control for precise aggressive maneuvers with quadrotors The International Journal of Robotics Research 31 664-674
[4]  
Schoellig A. P.(2011)Stability of first and high order iterative learning control with data dropouts International Journal of Control, Automation and Systems 9 843-849
[5]  
Mueller F. L.(2016)An iterative learning control design approach for networked control systems with data dropouts International Journal of Robust and Nonlinear Control 26 91-109
[6]  
D’Andrea R.(2009)A survey of robot learning from demonstration Robotics and Autonomous Systems 57 469-483
[7]  
Mellinger D.(2010)Autonomous helicopter aerobatics through apprenticeship learning The International Journal of Robotics Research 29 1608-1639
[8]  
Michael N.(2007)On learning, representing, and generalizing a task in a humanoid robot IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 37 286-298
[9]  
Kumar V.(2009)Imitation learning of humanoid locomotion using the direction of landing foot International Journal of Control, Automation and Systems 7 585-597
[10]  
Bu X.(1997)Robot learning from demonstration The International Conference on Machine Learning (ICML) 97 12-20