Eavesdropping Game Based on Multi-Agent Deep Reinforcement Learning

被引：0

作者：

Guo, Delin ^{[1
]}

Tang, Lan ^{[1
]}

Yang, Lvxi ^{[2
]}

Liang, Ying-Chang ^{[2
]}

机构：

[1] Nanjing Univ, Nanjing, Peoples R China

[2] Southeast Univ, Nanjing, Peoples R China

来源：

2022 IEEE 23RD INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATION (SPAWC) | 2022年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Physical layer security; proactive eavesdropping; stochastic game; multi-agent reinforcement learning; WIRETAP CHANNEL;

D O I：

10.1109/SPAWC51304.2022.9833927

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper considers an adversarial scenario between a legitimate eavesdropper and a suspicious communication pair. All three nodes are equipped with multiple antennas. The eavesdropper, which operates in a full-duplex model, aims to wiretap the dubious communication pair via proactive jamming. On the other hand, the suspicious transmitter, which can send artificial noise (AN) to disturb the wiretap channel, aims to guarantee secrecy. More specifically, the eavesdropper adjusts jamming power to enhance the wiretap rate, while the suspicious transmitter jointly adapts the transmit power and noise power against the eavesdropping. Considering the partial observation and complicated interactions between the eavesdropper and the suspicious pair in unknown system dynamics, we model the problem as an imperfect-information stochastic game. To approach the Nash equilibrium solution of the eavesdropping game, we develop a multi-agent reinforcement learning (MARL) algorithm, termed neural fictitious self-play with soft actor-critic (NFSP-SAC), by combining the fictitious self-play (FSP) with a deep reinforcement learning algorithm, SAC. The introduction of SAC enables FSP to handle the problems with continuous and high dimension observation and action space. The simulation results demonstrate that the power allocation policies learned by our method empirically converge to a Nash equilibrium, while the compared reinforcement learning algorithms suffer from severe fluctuations during the learning process.

引用

页数：5

共 14 条

[1] A comprehensive survey of multiagent reinforcement learning
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
[2] Guaranteeing secrecy using artificial noise
Goel, Satashu
Negi, Rohit
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2008, 7 (06) : 2180 - 2189
[3] Haarnoja T, 2018, PR MACH LEARN RES, V80
[4] Faster Learning and Adaptation in Security Games by Exploiting Information Asymmetry
He, Xiaofan
Dai, Huaiyu
Ning, Peng
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (13) : 3429 - 3443
[5] Heinrich J, 2016, Arxiv, DOI arXiv:1603.01121
[6] Heinrich J, 2015, PR MACH LEARN RES, V37, P805
[7] Nash Q-learning for general-sum stochastic games
Hu, JL
Wellman, MP
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1039 - 1069
[8] Wiretap Channel With Full-Duplex Proactive Eavesdropper: A Game Theoretic Approach
Huang, Wei
Chen, Wei
Bai, Bo
Han, Zhu
[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (08) : 7658 - 7663
[9] Jamming Games in the MIMO Wiretap Channel With an Active Eavesdropper
Mukherjee, Amitav
Swindlehurst, A. Lee
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (01) : 82 - 91
[10] Ratcliffe Dino Stephen, 2019, 2019 IEEE C GAMES CO, P1

← 1 2 →