Distributed Deep Reinforcement Learning with Wideband Sensing for Dynamic Spectrum Access

被引：6

作者：

Kaytaz, Umuralp ^{[1
]}

Ucar, Seyhan ^{[3
]}

Akgun, Bans ^{[2
]}

Coleri, Sinem ^{[1
]}

机构：

[1] Koc Univ, Dept Elect & Elect Engn, Istanbul, Turkey

[2] Koc Univ, Dept Comp Engn, Istanbul, Turkey

[3] Toyota Motor North Amer R&D, InfoTech Labs, Mountain View, CA USA

来源：

2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC) | 2020年

关键词：

Cognitive radio; dynamic spectrum access; deep reinforcement learning; medium access control (MAC); OPTIMALITY;

D O I：

10.1109/wcnc45663.2020.9120840

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic Spectrum Access (DSA) improves spectrum utilization by allowing secondary users (SUs) to opportunistically access temporary idle periods in the primary user (PU) channels. Previous studies on utility maximizing spectrum access strategies mostly require complete network state information, therefore, may not be practical. Model-free reinforcement learning (RL) based methods, such as Q-learning, on the other hand, are promising adaptive solutions that do not require complete network information. In this paper, we tackle this research dilemma and propose deep Q-learning originated spectrum access (DQLS) based decentralized and centralized channel selection methods for network utility maximization, namely DEcentralized Spectrum Allocation (DESA) and Centralized Spectrum Allocation (CSA), respectively. Actions that are generated through centralized deep Q-network (DQN) are utilized in CSA whereas the DESA adopts a non-cooperative approach in spectrum decisions. We use extensive simulations to investigate spectrum utilization of our proposed methods for varying primary and secondary network sizes. Our findings demonstrate that proposed schemes outperform model-based RL and traditional approaches, including slotted-Aloha and Whittle index policy, while %87 of optimal channel access is achieved.

引用

页数：6

共 17 条

[11] Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access [J].

Naparstek, Oshri ;

Cohen, Kobi .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (01) :310-323

[12]

Ray Alex, 2017, Advances in neural information processing systems, V30

[13]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

[14] On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach [J].

Wang, Kehao ;

Chen, Lin .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (01) :300-309

[15] Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks [J].

Wang, Shangxing ;

Liu, Hanpeng ;

Gomes, Pedro Henrique ;

Krishnamachari, Bhaskar .

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2018, 4 (02) :257-265

[16] Missing intercultural engagements in the university experiences of Chinese international students in the UK [J].

Yu, Yun ;

Moskal, Marta .

COMPARE-A JOURNAL OF COMPARATIVE AND INTERNATIONAL EDUCATION, 2019, 49 (04) :654-671

[17] Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework [J].

Zhao, Qing ;

Tong, Lang ;

Swami, Ananthram ;

Chen, Yunxia .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2007, 25 (03) :589-600

← 1 2 →