UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

被引:44
作者
Bayerlein, Harald [1 ]
Theile, Mirco [2 ]
Caccamo, Marco [2 ]
Gesbert, David [1 ]
机构
[1] EURECOM, Commun Syst Dept, Sophia Antipolis, France
[2] Tech Univ Munich, TUM Dept Mech Engn, Munich, Germany
来源
2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) | 2020年
基金
欧洲研究理事会;
关键词
D O I
10.1109/GLOBECOM42002.2020.9322234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RI) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.
引用
收藏
页数:6
相关论文
共 15 条
  • [1] Bayerlein H, 2018, IEEE INT WORK SIGN P, P945
  • [2] A Survey on Machine-Learning Techniques for UAV-Based Communications
    Bithas, Petros S.
    Michailidis, Emmanouel T.
    Nomikos, Nikolaos
    Vouyioukas, Demosthenes
    Kanatas, Athanasios G.
    [J]. SENSORS, 2019, 19 (23)
  • [3] Dulac-Arnold G, 2019, arXiv
  • [4] Esrafilian O., 2018, IEEE IOT J, V6, P1791
  • [5] Adaptive Finite-Time Fuzzy Funnel Control for Nonaffine Nonlinear Systems
    Liu, Cungen
    Wang, Huanqing
    Liu, Xiaoping
    Zhou, Yucheng
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (05): : 2894 - 2903
  • [6] Maciel-Pearson B.G., 2019, ARXIV PREPRINT ARXIV
  • [7] Minevich M., 2020, Forbes
  • [8] Human-level control through deep reinforcement learning
    Mnih, Volodymyr
    Kavukcuoglu, Koray
    Silver, David
    Rusu, Andrei A.
    Veness, Joel
    Bellemare, Marc G.
    Graves, Alex
    Riedmiller, Martin
    Fidjeland, Andreas K.
    Ostrovski, Georg
    Petersen, Stig
    Beattie, Charles
    Sadik, Amir
    Antonoglou, Ioannis
    King, Helen
    Kumaran, Dharshan
    Wierstra, Daan
    Legg, Shane
    Hassabis, Demis
    [J]. NATURE, 2015, 518 (7540) : 529 - 533
  • [9] Energy Efficient 3-D UAV Control for Persistent Communication Service and Fairness: A Deep Reinforcement Learning Approach
    Qi, Hang
    Hu, Zhiqun
    Huang, Hao
    Wen, Xiangming
    Lu, Zhaoming
    [J]. IEEE ACCESS, 2020, 8 : 53172 - 53184
  • [10] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1