Model-Free Learning of Optimal Ergodic Policies in Wireless Systems

被引:7
|
作者
Kalogerias, Dionysios S. [1 ]
Eisen, Mark [3 ]
Pappas, George J. [2 ]
Ribeiro, Alejandro [2 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
[2] Univ Penn, Dept Elect Syst Engn, Philadelphia, PA 19104 USA
[3] Intel Corp, Hillsboro, OR 97124 USA
关键词
Wireless communication; Resource management; Smoothing methods; Stochastic processes; Fading channels; Approximation algorithms; Signal processing algorithms; Wireless systems; stochastic resource allocation; zeroth-order optimization; constrained nonconvex optimization; deep learning; Lagrangian duality; strong duality; RESOURCE-ALLOCATION; POWER ALLOCATION; NETWORKS; OPTIMIZATION; ACCESS;
D O I
10.1109/TSP.2020.3030073
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This article constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.
引用
收藏
页码:6272 / 6286
页数:15
相关论文
共 50 条
  • [1] MODEL-FREE LEARNING OF OPTIMAL DETERMINISTIC RESOURCE ALLOCATIONS IN WIRELESS SYSTEMS VIA ACTION-SPACE EXPLORATION
    Hashmi, Hassaan
    Kalogerias, Dionysios S.
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [2] Model-free stochastic learning in adaptive wireless networks
    Chandramouli, R.
    2007 IEEE SARNOFF SYMPOSIUM, 2007, : 462 - 466
  • [3] Model-Free Reinforcement Learning by Embedding an Auxiliary System for Optimal Control of Nonlinear Systems
    Xu, Zhenhui
    Shen, Tielong
    Cheng, Daizhan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1520 - 1534
  • [4] Model-free machine learning of wireless SISO/MIMO communications
    Garcia, Dolores
    Lacruz, Jesus O.
    Badini, Damiano
    De Donno, Danilo
    Widmer, Joerg
    COMPUTER COMMUNICATIONS, 2022, 181 : 192 - 202
  • [5] Optimal Online Learning Procedures for Model-Free Policy Evaluation
    Ueno, Tsuyoshi
    Maeda, Shin-ichi
    Kawanabe, Motoaki
    Ishii, Shin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 473 - +
  • [6] On Model-Free Reinforcement Learning of Reduced-order Optimal Control for Singularly Perturbed Systems
    Mukherjee, Sayak
    Bai, He
    Chakrabortty, Aranya
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 5288 - 5293
  • [7] Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games
    Zhou, Peixin
    Xue, Huiwen
    Wen, Jiwei
    Shi, Peng
    Luan, Xaoli
    INFORMATION SCIENCES, 2023, 647
  • [8] Model-free Predictive Optimal Iterative Learning Control using Reinforcement Learning
    Zhang, Yueqing
    Chu, Bing
    Shu, Zhan
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3279 - 3284
  • [9] A Survey on Applications of Model-Free Strategy Learning in Cognitive Wireless Networks
    Wang, Wenbo
    Kwasinski, Andres
    Niyato, Dusit
    Han, Zhu
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (03): : 1717 - 1757
  • [10] Model-Free design of control systems over wireless fading channels
    Lima, Vinicius
    Eisen, Mark
    Gatsis, Konstantinos
    Ribeiro, Alejandro
    SIGNAL PROCESSING, 2022, 197