Approximating Nash equilibrium for anti-UAV jamming Markov game using a novel event-triggered multi-agent reinforcement learning

被引：17

作者：

Feng, Zikai ^{[1
,2
]}

Huang, Mengxing ^{[1
,2
]}

Wu, Yuanyuan ^{[1
]}

Wu, Di ^{[1
,3
]}

Cao, Jinde ^{[4
,5
]}

Korovin, Iakov ^{[6
]}

Gorbachev, Sergey ^{[7
]}

Gorbacheva, Nadezhda ^{[6
]}

机构：

[1] Hainan Univ, Sch Informat & Commun Engn, Haikou 570228, Peoples R China

[2] State Key Lab Marine Resource Utilizat South China, Haikou 570228, Peoples R China

[3] Shanghai Jiao Tong Univ, Dept Automation, Shanghai 200240, Peoples R China

[4] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[5] Yonsei Univ, Yonsei Frontier Lab, Seoul 03722, South Korea

[6] Southern Fed Univ, Sci Res Inst Multiprocessor Comp Syst, 2,Chekhov st, Taganrog 347928, Russia

[7] Russian Acad Engn, 9, Bldg 4,Gazetny pereulok, Moscow 125009, Russia

来源：

NEURAL NETWORKS | 2023年 / 161卷

关键词：

Anti -jamming Markov game; Event -triggered multi -agent deep; reinforcement learning; Beta strategy; Nash equilibrium; LEVEL;

D O I：

10.1016/j.neunet.2022.12.022

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the downlink communication, it is currently challenging for ground users to cope with the uncertain interference from aerial intelligent jammers. The cooperation and competition between ground users and unmanned aerial vehicle (UAV) jammers leads to a Markov game problem of anti-UAV jamming. Therefore, a model-free method is adopted based on multi-agent reinforcement learning (MARL) to handle the Markov game. However, the benchmark MARL strategies suffer from dimension explosion and local optimal convergence. To solve these issues, a novel event-triggered multi-agent proximal policy optimization algorithm with Beta strategy (ETMAPPO) is proposed in this paper, which aims to reduce the dimension of information transmission and improve the efficiency of policy convergence. In this event-triggering mechanism, agents can learn to obtain appropriate observation in different moment, thereby reducing the transmission of valueless information. Beta operator is used to optimize the action search. It expands the search scope of policy space. Ablation simulations show that the proposed strategy achieves better global benefits with fewer dimension of information than benchmark algorithms. In addition, the convergence performance verifies that the well-trained ETMAPPO has the capability to achieve stable jamming strategies and stable anti-jamming strategies. This approximately constitutes the Nash equilibrium of the anti-jamming Markov game.(c) 2023 Elsevier Ltd. All rights reserved.

引用

页码：330 / 342

页数：13

共 47 条

[1]

Al-Hourani A, 2014, IEEE GLOB COMM CONF, P2898, DOI 10.1109/GLOCOM.2014.7037248

[2]

Berner Christopher, 2019, Dota 2 with large scale deep reinforcement learning

[3]

Bhattacharya S, 2010, P AMER CONTR CONF, P818

[4]

Conitzer Vincent, 2002, arXiv

[5]

Dohmann P. B. G., 2020, IEEE T ROBOT, P1

[6] Approximating Nash Equilibrium in Day-Ahead Electricity Market Bidding with Multi-Agent Deep Reinforcement Learning [J].

Du, Yan ;

Li, Fangxing ;

Zandi, Helia ;

Xue, Yaosuo .

JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2021, 9 (03) :534-544

[7]

Fan S., 2021, NEUROCOMPUTING, V439, P212

[8]

Feng Q., 2006, IEEE VEH TECHN C

[9]

Gao Y., 2020, IEEE COMMUN LETT, P1

[10]

Guanhan W. U., 2021, J. Electron. Inf. Technol., V43, P1

← 1 2 3 4 5 →