Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Meta-Reinforcement Learning

被引:0
作者
Jiang W. [1 ,2 ]
Wu J. [1 ,2 ]
Wang Y. [1 ,2 ]
机构
[1] College of Electrical and Information Engineering, Hunan Unviersity, Changsha
[2] National Engineering Research Center of Robot Visual Perception & Control Technology, Hunan University, Changsha
来源
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences | 2022年 / 49卷 / 06期
基金
中国国家自然科学基金;
关键词
autonomous obstacle avoidance; meta-reinforcement learning; path planning; target tracking; Unmanned Aerial Vehicle(UAV);
D O I
10.16339/j.cnki.hdxbzkb.2022290
中图分类号
学科分类号
摘要
There are some problems with traditional deep reinforcement learning in solving autonomous obstacle avoidance and target tracking tasks for unmanned aerial vehicles(UAV),such as low training efficiency and weak adaptability to variable environments. To overcome these problems,this paper designs an internal and external meta-parameter update rule by incorporating Model-Agnostic Meta-Learning(MAML)into Deep Deterministic Policy Gradient(DDPG)algorithm and proposes a Meta-Deep Deterministic Policy Gradient(Meta-DDPG)algorithm inovder to improve the convergence speed and generalization ability of the model. Furthermore,the basic meta-task sets are constructed in the model’s pre-training stage to improve the efficiency of pre-training in practical engineering. Finally,the proposed algorithm is simulated and verified in Various testing environments. The results show that the introduction of the basic meta-task sets can make the model’s pre-training more efficient,Meta-DDPG algorithm has better convergence characteristics and environmental adaptability when compared with the DDPG algorithm. Furthermore,the meta-learning and the basic meta-task sets are universal to deterministic policy reinforcement learning. © 2022 Hunan University. All rights reserved.
引用
收藏
页码:101 / 109
页数:8
相关论文
共 17 条
[1]  
MA X M, JIN W Y., Mulit-objcctive path planning based on improved and colony algorithm, Computing Technology and Automation, 39, 4, pp. 100-105, (2020)
[2]  
HINOSTROZA M A, GUEDES SOARES C G., Modified vector field path-following control system for an underactuated autonomous surface ship modelin the presence of static obstacles, Journal of Marine Science and Engineering, 9, 6, (2021)
[3]  
ZHANG T K, LEI J Y, LIU Y W, Trajectory optimization for UAV emergency communication with limited user equipment energy:a safe-DQN approach[J]., IEEE Transactions on Green Communications and Networking, 5, 3, pp. 1236-1247, (2021)
[4]  
HUANG H J, YANG Y C, WANG H, Deep reinforcement learning for UAV navigation through massive MIMO technique [J], IEEE Transactions on Vehicular Technology, 69, 1, pp. 1117-1121, (2020)
[5]  
WU X, CHEN H L, CHEN C G, The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method[J], Knowledge-Based Systems, 196, (2020)
[6]  
MNIH V, KAVUKCUOGLU K, SILVER D, Human-level control through deep reinforcement learning[J], Nature, 518, 7540, pp. 529-533, (2015)
[7]  
YOU S X, GAO L P, Target tracking strategy using deep deterministic policy gradient, Applied Soft Computing, 95, (2020)
[8]  
HU Z J, WAN K F,, GAO X G, Deep reinforcement learning approach with multiple experience pools for UAV′s autonomous motion planning in complex unknown environments[J].Sensors (Basel,Switzerland), 20, 7, (2020)
[9]  
LILLICRAP T P, HUNT J J,, PRITZEL A, Continuous control with deep reinforcement learning, (2015)
[10]  
FINN C, ABBEEL P, LEVINE S., Model-agnostic meta-learning for fast adaptation of deep networks[EB/OL], (2017)