Multi-AUV Pursuit-Evasion Game in the Internet of Underwater Things: An Efficient Training Framework via Offline Reinforcement Learning

被引：2

作者：

Xu, Jingzehua ^{[1
]}

Zhang, Zekai ^{[1
]}

Wang, Jingjing ^{[2
,3
]}

Han, Zhu ^{[4
,5
]}

Ren, Yong ^{[6
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

[3] Beihang Univ, Hangzhou Innovat Inst, Hangzhou 310051, Peoples R China

[4] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA

[5] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 446701, South Korea

[6] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 19期

基金：

中国国家自然科学基金; 日本科学技术振兴机构;

关键词：

Games; Training; Target tracking; Sensors; Task analysis; Internet of Things; Transformers; Autonomous underwater vehicle (AUV); decision transformer (DT); finite-horizon Markov game process (FMGP); offline reinforcement learning (ORL); pursuit-evasion game;

D O I：

10.1109/JIOT.2024.3416616

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we investigate the pursuit-evasion game of multiple autonomous underwater vehicles (AUVs) in a complex ocean environment. The pursuer AUVs need to optimize their trajectories to avoid obstacles and dangerous vortex regions in the environment in order to pursue the escaper AUV. Both the pursuer and escaper can sense each other with limited detection capabilities for further pursuit or escape. As the underwater pursuit-evasion (UPE) game is a high-dimensional NP-hard problem, we innovatively transform it into a finite-horizon Markov game process and propose a decentralized training and decentralized execution efficient training framework based on the offline reinforcement learning. During the training process, we propose multiagent independent soft actor-critic to facilitate policy improvement and generate the offline data set, and propose multiagent independent decision transformer for model training in the UPE game. Extensive simulations demonstrate the scalability and generalization ability of our proposed training framework, which can achieve excellent performance in the UPE games under different conditions and environments with only a few AUVs participating in policy improvement to generate the high-quality offline data set.

引用

页码：31273 / 31286

页数：14

共 18 条

[1] Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning
Zhang, Ruilong
Zong, Qun
Zhang, Xiuyun
Dou, Liqian
Tian, Bailing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7900 - 7909
[2] Transfer reinforcement learning for multi-agent pursuit-evasion differential game with obstacles in a continuous environment
Hu, Penglin
Pan, Quan
Zhao, Chunhui
Guo, Yaning
ASIAN JOURNAL OF CONTROL, 2024, 26 (04) : 2125 - 2140
[3] Apollonius partitions based pursuit-evasion strategies via multi-agent reinforcement
Xue, Lei
Wang, Qing
Wu, Yongbao
Yuan, Xin
Liu, Jian
NEUROCOMPUTING, 2025, 630
[4] Underwater Target Tracking Based on Interrupted Software-Defined Multi-AUV Reinforcement Learning: A Multi-AUV Time-Saving MARL Approach
Zhu, Shengchao
Han, Guangjie
Lin, Chuan
Zhang, Yu
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (03) : 2124 - 2136
[5] Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment
Qu, Xiuqing
Gan, Wenhao
Song, Dalei
Zhou, Liqin
OCEAN ENGINEERING, 2023, 273
[6] Cooperative control for multi-player pursuit-evasion games with reinforcement learning
Wang, Yuanda
Dong, Lu
Sun, Changyin
NEUROCOMPUTING, 2020, 412 : 101 - 114
[7] Adaptive Optimal Control via Q-Learning for Multi-Agent Pursuit-Evasion Games
Dong, Xu
Zhang, Huaguang
Ming, Zhongyang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3056 - 3060
[8] Integral reinforcement learning based dynamic stackelberg pursuit-evasion game for unmanned surface vehicles
Hu, Xiaoxiang
Liu, Shuaizheng
Xu, Jingwen
Xiao, Bing
Guo, Chenguang
ALEXANDRIA ENGINEERING JOURNAL, 2024, 108 : 428 - 435
[9] Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning
Zhou, Zejian
Xu, Hao
NEUROCOMPUTING, 2022, 484 : 46 - 58
[10] Reinforcement learning-based decision-making for spacecraft pursuit-evasion game in elliptical orbits
Yu, Weizhuo
Liu, Chuang
Yue, Xiaokui
CONTROL ENGINEERING PRACTICE, 2024, 153

← 1 2 →