Multi-Agent Reinforcement Learning Based Uplink OFDMA for IEEE 802.11ax Networks

被引:2
作者
Han, Mingqi [1 ]
Sun, Xinghua [1 ]
Zhan, Wen [1 ]
Gao, Yayu [2 ]
Jiang, Yuan [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Heuristic algorithms; Throughput; Uplink; Computational complexity; Sun; IEEE 802.11ax Standard; Optimization; Multiple access; multi-agent reinforcement learning; multi-objective reinforcement learning; mean-field reinforcement learning; DYNAMIC MULTICHANNEL ACCESS; MINIMIZATION; INFORMATION;
D O I
10.1109/TWC.2024.3355276
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the IEEE 802.11ax Wireless Local Area Networks (WLANs), Orthogonal Frequency Division Multiple Access (OFDMA) has been applied to enable the high-throughput WLAN amendment. However, with the growth of the number of devices, it is difficult for the Access Point (AP) to schedule uplink transmissions, which calls for an efficient access mechanism in the OFDMA uplink system. Based on Multi-Agent Proximal Policy Optimization (MAPPO), we propose a Mean-Field Multi-Agent Proximal Policy Optimization (MFMAPPO) algorithm to improve the throughput and guarantee the fairness. Motivated by the Mean-Field games (MFGs) theory, a novel global state and action design are proposed to ensure the convergence of MFMAPPO in the massive access scenario. The Multi-Critic Single-Policy (MCSP) architecture is deployed in the proposed MFMAPPO so that each agent can learn the optimal channel access strategy to improve the throughput while satisfying fairness requirement. Extensive simulation experiments are performed to show that the MFMAPPO algorithm 1) has low computational complexity that increases linearly with respect to the number of stations 2) achieves nearly optimal throughput and fairness performance in the massive access scenario, 3) can adapt to various diverse and dynamic traffic conditions without retraining, as well as the traffic condition different from training traffic.
引用
收藏
页码:8868 / 8882
页数:15
相关论文
共 40 条
  • [1] (ReLBT): A Reinforcement learning-enabled listen before talk mechanism for LTE-LAA and Wi-Fi coexistence in IoT
    Ali, R.
    Kim, B.
    Kim, S. W.
    Kim, H. S.
    Ishmanov, F.
    [J]. COMPUTER COMMUNICATIONS, 2020, 150 : 498 - 505
  • [2] [Anonymous], 2021, IEEE Standard P802.11ax/D8.0., P1
  • [3] Multiagent Reinforcement Learning Meets Random Access in Massive Cellular Internet of Things
    Bai, Jianan
    Song, Hao
    Yi, Yang
    Liu, Lingjia
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (24): : 17417 - 17428
  • [4] Bellalta B, 2016, IEEE WIREL COMMUN, V23, P38, DOI 10.1109/MWC.2016.7422404
  • [5] Reinforcement Learning for Mixed Cooperative/Competitive Dynamic Spectrum Access
    Bowyer, Caleb
    Greene, David
    Ward, Tyler
    Menendez, Marco
    Shea, John
    Wong, Tan
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON DYNAMIC SPECTRUM ACCESS NETWORKS (DYSPAN), 2019, : 479 - 484
  • [6] Distributive Dynamic Spectrum Access Through Deep Reinforcement Learning: A Reservoir Computing-Based Approach
    Chang, Hao-Hsuan
    Song, Hao
    Yi, Yang
    Zhang, Jianzhong
    He, Haibo
    Liu, Lingjia
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (02) : 1938 - 1948
  • [7] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
  • [8] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
  • [9] Mean Field Evolutionary Dynamics in Dense-User Multi-Access Edge Computing Systems
    Gao, Hao
    Li, Wuchen
    Banez, Reginald A.
    Han, Zhu
    Poor, H. Vincent
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (12) : 7825 - 7835
  • [10] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1