Distributed Power Control for Large Energy Harvesting Networks: A Multi-Agent Deep Reinforcement Learning Approach

被引:35
作者
Sharma, Mohit K. [1 ]
Zappone, Alessio [2 ,3 ]
Assaad, Mohamad [1 ]
Debbah, Merouane [1 ,4 ]
Vassilaras, Spyridon [4 ]
机构
[1] Univ Paris Saclay, Cent Supelec, F-91192 Gif Sur Yvette, France
[2] Cent Supelec, Lab Signaux & Syst, F-91190 Gif Sur Yvette, France
[3] Univ Cassino & Southern Lazio, Dept Elect & Informat Engn, I-03043 Cassino, Italy
[4] Huawei France R&D, Math & Algorithm Sci Lab, F-92100 Paris, France
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
Artificial neural networks; distributed algorithms; energy harvesting; learning (artificial intelligence); multi-agent systems; multiple access; THEORETIC APPROACH; IOT;
D O I
10.1109/TCCN.2019.2949589
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In this paper, we develop a multi-agent reinforcement learning (MARL) framework to obtain online power control policies for a large energy harvesting (EH) multiple access channel, when only causal information about the EH process and wireless channel is available. In the proposed framework, we model the online power control problem as a discrete-time mean-field game (MFG), and analytically show that the MFG has a unique stationary solution. Next, we leverage the fictitious play property of the mean-field games, and the deep reinforcement learning technique to learn the stationary solution of the game, in a completely distributed fashion. We analytically show that the proposed procedure converges to the unique stationary solution of the MFG. This, in turn, ensures that the optimal policies can be learned in a completely distributed fashion. In order to benchmark the performance of the distributed policies, we also develop a deep neural network (DNN) based centralized as well as distributed online power control schemes. Our simulation results show the efficacy of the proposed power control policies. In particular, the DNN based centralized power control policies provide a very good performance for large EH networks for which the design of optimal policies is intractable using the conventional methods such as Markov decision processes. Further, performance of both the distributed policies is close to the throughput achieved by the centralized policies.
引用
收藏
页码:1140 / 1154
页数:15
相关论文
共 42 条
[1]  
[Anonymous], 2017, P IEEE WIR COMM NETW
[2]  
[Anonymous], CORR
[3]  
[Anonymous], LEARNING AIDED OPTIM
[4]  
[Anonymous], LEARNING NONATOMIC G
[5]  
[Anonymous], 2017, DYNAMIC PROGRAMMING
[6]  
[Anonymous], ACCELERATED STRUCTUR
[7]  
[Anonymous], MARKOV DECISION PROC
[8]   Energy Harvesting Multiple Access Channels: Optimal and Near-Optimal Online Policies [J].
Baknina, Abdulrahman ;
Ulukus, Sennur .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2018, 66 (07) :2904-2917
[9]   Multi-Access Communications With Energy Harvesting: A Multi-Armed Bandit Model and the Optimality of the Myopic Policy [J].
Blasco, Pol ;
Guenduez, Deniz .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2015, 33 (03) :585-597
[10]   A Learning Theoretic Approach to Energy Harvesting Communication System Optimization [J].
Blasco, Pol ;
Guenduez, Deniz ;
Dohler, Mischa .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2013, 12 (04) :1872-1882