Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

被引：9

作者：

Zhu, Xinyu ^{[1
]}

Huang, Yang ^{[1
]}

Wang, Shaoyu ^{[1
]}

Wu, Qihui ^{[1
]}

Ge, Xiaohu ^{[2
]}

Liu, Yuan ^{[3
]}

Gao, Zhen ^{[4
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Key Lab Dynam Cognit Syst Electromagnet Spectrum S, Minist Ind & Informat Technol, Nanjing 210016, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China

[3] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China

[4] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

来源：

IEEE WIRELESS COMMUNICATIONS LETTERS | 2023年 / 12卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Jamming; Internet of Things; Wireless networks; Time-frequency analysis; Interference; Decision making; Channel estimation; Uplink transmissions; anti-jamming; Markov decision process; reinforcement learning; ALGORITHM;

D O I：

10.1109/LWC.2022.3228045

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.

引用

页码：386 / 390

页数：5

共 18 条

[1]

Aref MA, 2017, IEEE WCNC

[2] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[3] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[4]

Chen YF, 2005, IEEE COMMUN LETT, V9, P688, DOI [10.1109/LCOMM.2005.08002, 10.1109/LCOMM.2005.1496583]

[5] The kernel recursive least-squares algorithm [J].

Engel, Y ;

Mannor, S ;

Meir, R .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) :2275-2285

[6] Dynamic Resource Configuration for Low-Power IoT Networks: A Multi-Objective Reinforcement Learning Method [J].

Huang, Yang ;

Hao, Caiyong ;

Mao, Yijie ;

Zhou, Fuhui .

IEEE COMMUNICATIONS LETTERS, 2021, 25 (07) :2285-2289

[7] An In-Depth Analysis of IoT Security Requirements, Challenges, and Their Countermeasures via Software-Defined Security [J].

Iqbal, Waseem ;

Abbas, Haider ;

Daneshmand, Mahmoud ;

Rauf, Bilal ;

Bangash, Yawar Abbas .

IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (10) :10250-10276

[8] Game-Theoretic Learning Anti-Jamming Approaches in Wireless Networks [J].

Jia, Luliang ;

Qi, Nan ;

Chu, Feihuang ;

Fang, Shengliang ;

Wang, Ximing ;

Ma, Shuli ;

Feng, Shuo .

IEEE COMMUNICATIONS MAGAZINE, 2022, 60 (05) :60-66

[9] STACKELBERG GAME APPROACHES FOR ANTI-JAMMING DEFENCE IN WIRELESS NETWORKS [J].

Jia, Luliang ;

Xu, Yuhua ;

Sun, Youming ;

Feng, Shuo ;

Anpalagan, Alagan .

IEEE WIRELESS COMMUNICATIONS, 2018, 25 (06) :120-128

[10] Dynamic Spectrum Anti-Jamming in Broadband Communications: A Hierarchical Deep Reinforcement Learning Approach [J].

Li, Yangyang ;

Xu, Yuhua ;

Xu, Yitao ;

Liu, Xin ;

Wang, Ximing ;

Li, Wen ;

Anpalagan, Alagan .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2020, 9 (10) :1616-1619

← 1 2 →