Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

被引:0
作者
Huang, Yuanhanqing [1 ]
Hu, Jianghai [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
来源
IEEE OPEN JOURNAL OF CONTROL SYSTEMS | 2023年 / 2卷
基金
美国国家科学基金会;
关键词
Games; Convergence; Mirrors; Coherence; Linear programming; Heuristic algorithms; Control systems; Game theory; learning theory; optimization under uncertainties; stochastic systems; POWER;
D O I
10.1109/OJCSYS.2023.3316071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this article, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game.
引用
收藏
页码:366 / 379
页数:14
相关论文
共 41 条
  • [1] Azizian W, 2021, PR MACH LEARN RES, V134, P326
  • [2] Ba WJ, 2024, Arxiv, DOI arXiv:2112.02856
  • [3] Fast generalized Nash equilibrium seeking under partial-decision information
    Bianchi, Mattia
    Belgioioso, Giuseppe
    Grammatico, Sergio
    [J]. AUTOMATICA, 2022, 136
  • [4] Bravo M, 2018, ADV NEUR IN, V31
  • [5] Bubeck S, 2015, Arxiv, DOI arXiv:1405.4980
  • [6] Cai Y., 2023, P INT C LEARN REPR
  • [7] Cai Y, 2024, Arxiv, DOI arXiv:2206.05248
  • [8] A game-theoretic approach to efficient power management in sensor networks
    Campos-Nanez, Enrique
    Garcia, Alfredo
    Li, Chenyang
    [J]. OPERATIONS RESEARCH, 2008, 56 (03) : 552 - 561
  • [9] Diakonikolas J, 2021, PR MACH LEARN RES, V130
  • [10] Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games
    Drusvyatskiy, Dmitriy
    Fazel, Maryam
    Ratliff, Lillian J.
    [J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 3403 - 3408