Voting-Based Multiagent Reinforcement Learning for Intelligent IoT

被引:12
作者
Xu, Yue [1 ,2 ,3 ]
Deng, Zengde [4 ]
Wang, Mengdi [5 ,6 ]
Xu, Wenjun [1 ]
So, Anthony Man-Cho [4 ]
Cui, Shuguang [2 ,7 ]
机构
[1] Beijing Univ Posts & Telecommun, Minist Educ, Key Lab Universal Wireless Commun, Beijing 100876, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Chinese Univ Hong Kong, Sch Sci & Engn, Shenzhen 518172, Peoples R China
[4] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
[5] Princeton Univ, Dept Elect Engn, Ctr Stat & Machine Learning, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[6] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[7] Chinese Univ Hong Kong, Future Network Intelligence Inst, Shenzhen 518172, Peoples R China
关键词
Convergence; Internet of Things; Optimization; Collaboration; Task analysis; Learning (artificial intelligence); Games; Multiagent reinforcement learning (MARL); primal– dual algorithm; voting mechanism; ALLOCATION; SYSTEMS; GAMES; EDGE;
D O I
10.1109/JIOT.2020.3021017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The recent success of single-agent reinforcement learning (RL) in Internet of Things (IoT) systems motivates the study of multiagent RL (MARL), which is more challenging but more useful in large-scale IoT. In this article, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sublinear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the process of achieving global consensus on optimality. Finally, we verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multiagent IoT systems.
引用
收藏
页码:2681 / 2693
页数:13
相关论文
共 63 条
[1]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[2]   3-D Placement of an Unmanned Aerial Vehicle Base Station (UAV-BS) for Energy-Efficient Maximal Coverage [J].
Alzenad, Mohamed ;
El-Keyi, Amr ;
Lagum, Faraj ;
Yanikomeroglu, Halim .
IEEE WIRELESS COMMUNICATIONS LETTERS, 2017, 6 (04) :434-437
[3]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[4]   Decentralized Q-Learning for Stochastic Teams and Games [J].
Arslan, Gurdal ;
Yuksel, Serdar .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (04) :1545-1558
[5]   A Voting-Based Distributed Cooperative Spectrum Sensing Strategy for Connected Vehicles [J].
Aygun, Bengi ;
Wyglinski, Alexander M. .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (06) :5109-5121
[6]   Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model [J].
Azar, Mohammad Gheshlaghi ;
Munos, Remi ;
Kappen, Hilbert J. .
MACHINE LEARNING, 2013, 91 (03) :325-349
[7]  
Bertsekas D., 2012, Dynamic Programming and Optimal Control: Volume I, VVolume 1
[8]   Efficient 3-D Placement of an Aerial Base Station in Next Generation Cellular Networks [J].
Bor-Yaliniz, R. Irem ;
El-Keyi, Amr ;
Yanikomeroglu, Haiti .
2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
[9]   A survey of mobility models for ad hoc network research [J].
Camp, T ;
Boleng, J ;
Davies, V .
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2002, 2 (05) :483-502
[10]   Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].
Challita, Ursula ;
Saad, Walid ;
Bettstetter, Christian .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140