Proximal policy optimization with an integral compensator for quadrotor control

被引:0
作者
Huan Hu
Qing-ling Wang
机构
[1] Southeast University,School of Automation
来源
Frontiers of Information Technology & Electronic Engineering | 2020年 / 21卷
关键词
Reinforcement learning; Proximal policy optimization; Quadrotor control; Neural network; TP183; TP273;
D O I
暂无
中图分类号
学科分类号
摘要
We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.
引用
收藏
页码:777 / 795
页数:18
相关论文
共 45 条
  • [1] Alexis K(2012)Model predictive quadrotor control: attitude, altitude and position experimental studies IET Contr Theory Appl 6 1812-1827
  • [2] Nikolakopoulos G(1998)Natural gradient works efficiently in learning Neur Comput 10 251-276
  • [3] Tzes A(2018)Adaptive trajectory tracking for quadrotor MAVs in presence of parameter uncertainties and external disturbances IEEE Trans Contr Syst Technol 26 248-254
  • [4] Amari SI(2010)Output feedback control of a quadrotor UAV using neural networks IEEE Trans Neur Netw 21 50-66
  • [5] Antonelli G(2017)Control of a quadrotor with reinforcement learning IEEE Robot Autom Lett 2 2096-2103
  • [6] Cataldi E(2013)Robust adaptive attitude tracking on SO(3) with an application to a quadrotor UAV IEEE Trans Contr Syst Technol 21 1924-1930
  • [7] Arrichiello F(1995)Evolving mobile robots in simulated and real environments Artif Life 2 417-434
  • [8] Dierks T(2015)Human-level control through deep reinforcement learning Nature 518 529-533
  • [9] Jagannathan S(2010)Flight PID controller design for a UAV quadrotor Sci Res Essays 5 3660-3667
  • [10] Hwangbo J(2018)State-of-the-art intelligent flight control systems in unmanned aerial vehicles IEEE Trans Autom Sci Eng 15 613-627