Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

被引：9

作者：

Zhu, Xinyu ^{[1
]}

Huang, Yang ^{[1
]}

Wang, Shaoyu ^{[1
]}

Wu, Qihui ^{[1
]}

Ge, Xiaohu ^{[2
]}

Liu, Yuan ^{[3
]}

Gao, Zhen ^{[4
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Key Lab Dynam Cognit Syst Electromagnet Spectrum S, Minist Ind & Informat Technol, Nanjing 210016, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China

[3] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China

[4] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

来源：

IEEE WIRELESS COMMUNICATIONS LETTERS | 2023年 / 12卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Jamming; Internet of Things; Wireless networks; Time-frequency analysis; Interference; Decision making; Channel estimation; Uplink transmissions; anti-jamming; Markov decision process; reinforcement learning; ALGORITHM;

D O I：

10.1109/LWC.2022.3228045

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.

引用

页码：386 / 390

页数：5

共 18 条

[11] Anti-Jamming Communications Using Spectrum Waterfall: A Deep Reinforcement Learning Approach [J].

Liu, Xin ;

Xu, Yuhua ;

Jia, Luliang ;

Wu, Qihui ;

Anpalagan, Alagan .

IEEE COMMUNICATIONS LETTERS, 2018, 22 (05) :998-1001

[12]

Melo F. S., 2001, Rep.

[13]

Powell WB, 2007, APPROXIMATE DYNAMIC PROGRAMMING: SOLVING THE CURSES OF DIMENSIONALITY, P1, DOI 10.1002/9780470182963

[14]

Tsiligkaridis T, 2018, IEEE GLOB CONF SIG, P579, DOI 10.1109/GlobalSIP.2018.8646702

[15] Dynamic Air-Ground Collaboration for Multi-Access Edge Computing [J].

Wang, Shaoyu ;

Yang Huang ;

Clerckx, Bruno .

IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, :5365-5371

[16] Dynamic Spectrum Anti-Jamming Communications: Challenges and Opportunities [J].

Wang, Ximing ;

Wang, Jinlong ;

Xu, Yuhua ;

Chen, Jin ;

Jia, Luliang ;

Liu, Xin ;

Yang, Yijun .

IEEE COMMUNICATIONS MAGAZINE, 2020, 58 (02) :79-85

[17] Kernel-based least squares policy iteration for reinforcement learning [J].

Xu, Xin ;

Hu, Dewen ;

Lu, Xicheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (04) :973-992

[18] A Collaborative Multi-Agent Reinforcement Learning Anti-Jamming Algorithm in Wireless Networks [J].

Yao, Fuqiang ;

Jia, Luliang .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2019, 8 (04) :1024-1027

← 1 2 →