Distributed Deep Reinforcement Learning with Wideband Sensing for Dynamic Spectrum Access

被引：6

作者：

Kaytaz, Umuralp ^{[1
]}

Ucar, Seyhan ^{[3
]}

Akgun, Bans ^{[2
]}

Coleri, Sinem ^{[1
]}

机构：

[1] Koc Univ, Dept Elect & Elect Engn, Istanbul, Turkey

[2] Koc Univ, Dept Comp Engn, Istanbul, Turkey

[3] Toyota Motor North Amer R&D, InfoTech Labs, Mountain View, CA USA

来源：

2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC) | 2020年

关键词：

Cognitive radio; dynamic spectrum access; deep reinforcement learning; medium access control (MAC); OPTIMALITY;

D O I：

10.1109/wcnc45663.2020.9120840

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic Spectrum Access (DSA) improves spectrum utilization by allowing secondary users (SUs) to opportunistically access temporary idle periods in the primary user (PU) channels. Previous studies on utility maximizing spectrum access strategies mostly require complete network state information, therefore, may not be practical. Model-free reinforcement learning (RL) based methods, such as Q-learning, on the other hand, are promising adaptive solutions that do not require complete network information. In this paper, we tackle this research dilemma and propose deep Q-learning originated spectrum access (DQLS) based decentralized and centralized channel selection methods for network utility maximization, namely DEcentralized Spectrum Allocation (DESA) and Centralized Spectrum Allocation (CSA), respectively. Actions that are generated through centralized deep Q-network (DQN) are utilized in CSA whereas the DESA adopts a non-cooperative approach in spectrum decisions. We use extensive simulations to investigate spectrum utilization of our proposed methods for varying primary and secondary network sizes. Our findings demonstrate that proposed schemes outperform model-based RL and traditional approaches, including slotted-Aloha and Whittle index policy, while %87 of optimal channel access is achieved.

引用

页数：6

共 17 条

[1]

Abadi Martin, 2016, Proceedings of OSDI '16: 12th USENIX Symposium on Operating Systems Design and Implementation. OSDI '16, P265

[2] Optimality of Myopic Sensing in Multichannel Opportunistic Access [J].

Ahmad, Sahand Haji Ali ;

Liu, Mingyan ;

Javidi, Tara ;

Zhao, Qing ;

Krishnamachari, Bhaskar .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (09) :4040-4050

[3]

[Anonymous], 2015, Tech. Rep.

[4] THE THEORY OF DYNAMIC PROGRAMMING [J].

BELLMAN, R .

BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) :503-515

[5]

Nguyen HG, 2018, IEEE INT C ENG COMP, P1, DOI [10.1109/ICECCS2018.2018.00009, 10.1109/ICOPS35962.2018.9575287]

[6] ROBUST ESTIMATION OF LOCATION PARAMETER [J].

HUBER, PJ .

ANNALS OF MATHEMATICAL STATISTICS, 1964, 35 (01) :73-&

[7]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

[8] Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access [J].

Liu, Keqin ;

Zhao, Qing .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (11) :5547-5567

[9] Multiagent Reinforcement Learning Based Spectrum Sensing Policies for Cognitive Radio Networks [J].

Lunden, Jarmo ;

Kulkarni, Sanjeev R. ;

Koivunen, Visa ;

Poor, H. Vincent .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) :858-868

[10] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

← 1 2 →