Stochastic Power Adaptation with Multiagent Reinforcement Learning for Cognitive Wireless Mesh Networks

被引：57

作者：

Chen, Xianfu ^{[1
]}

Zhao, Zhifeng ^{[2
]}

Zhang, Honggang ^{[2
]}

机构：

[1] VTT Tech Res Ctr Finland, FI-90571 Oulu, Finland

[2] Zhejiang Univ, Dept Informat Sci & Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2013年 / 12卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Cognitive radio; resource allocation; algorithm/protocol design and analysis; reinforcement learning; RADIO; GAME;

D O I：

10.1109/TMC.2012.178

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the scarce spectrum resource is becoming overcrowded, cognitive radio indicates great flexibility to improve the spectrum efficiency by opportunistically accessing the authorized frequency bands. One of the critical challenges for operating such radios in a network is how to efficiently allocate transmission powers and frequency resource among the secondary users (SUs) while satisfying the quality-of-service constraints of the primary users. In this paper, we focus on the noncooperative power allocation problem in cognitive wireless mesh networks formed by a number of clusters with the consideration of energy efficiency. Due to the SUs' dynamic and spontaneous properties, the problem is modeled as a stochastic learning process. We first extend the single-agent Q-learning to a multiuser context, and then propose a conjecture-based multiagent Q-learning algorithm to achieve the optimal transmission strategies with only private and incomplete information. An intelligent SU performs Q-function updates based on the conjecture over the other SUs' stochastic behaviors. This learning algorithm provably converges given certain restrictions that arise during the learning procedure. Simulation experiments are used to verify the performance of our algorithm and demonstrate its effectiveness of improving the energy efficiency.

引用

页码：2155 / 2166

页数：12

共 28 条

[1]

Akyildiz Ian F., 2009, Ad Hoc Networks, V7, P810, DOI 10.1016/j.adhoc.2009.01.001

[2]

Chen T., 2007, P IEEE 2 INT S NEW F

[3] Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds [J].