Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

被引：0

作者：

Huang, Yuanhanqing ^{[1
]}

Hu, Jianghai ^{[1
]}

机构：

[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

来源：

IEEE OPEN JOURNAL OF CONTROL SYSTEMS | 2023年 / 2卷

基金：

美国国家科学基金会;

关键词：

Games; Convergence; Mirrors; Coherence; Linear programming; Heuristic algorithms; Control systems; Game theory; learning theory; optimization under uncertainties; stochastic systems; POWER;

D O I：

10.1109/OJCSYS.2023.3316071

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this article, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game.

引用

页码：366 / 379

页数：14

共 41 条

[1] Azizian W, 2021, PR MACH LEARN RES, V134, P326
[2] Ba WJ, 2024, Arxiv, DOI arXiv:2112.02856
[3] Fast generalized Nash equilibrium seeking under partial-decision information
Bianchi, Mattia
Belgioioso, Giuseppe
Grammatico, Sergio
[J]. AUTOMATICA, 2022, 136
[4] Bravo M, 2018, ADV NEUR IN, V31
[5] Bubeck S, 2015, Arxiv, DOI arXiv:1405.4980
[6] Cai Y., 2023, P INT C LEARN REPR
[7] Cai Y, 2024, Arxiv, DOI arXiv:2206.05248
[8] A game-theoretic approach to efficient power management in sensor networks
Campos-Nanez, Enrique
Garcia, Alfredo
Li, Chenyang
[J]. OPERATIONS RESEARCH, 2008, 56 (03) : 552 - 561
[9] Diakonikolas J, 2021, PR MACH LEARN RES, V130
[10] Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games
Drusvyatskiy, Dmitriy
Fazel, Maryam
Ratliff, Lillian J.
[J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 3403 - 3408

← 1 2 3 4 5 →