Zeroth-Order Learning in Continuous Games via Residual Pseudogradient Estimates

被引：0

作者：

Huang, Yuanhanqing ^{[1
]}

Hu, Jianghai ^{[1
]}

机构：

[1] Purdue Univ, Elmore Family Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2025年 / 70卷 / 04期

基金：

美国国家科学基金会;

关键词：

Games; Convergence; Optimization; Mirrors; Approximation algorithms; Linear programming; Heuristic algorithms; Vectors; Decision making; Wireless communication; Bandit methods; game theory; iterative learning control; multi-agent systems; optimization methods; variational inequality; VARIATIONAL-INEQUALITIES; EXTRAGRADIENT METHOD; CONVERGENCE; MONOTONE; CONVEX; OPTIMIZATION; POINT;

D O I：

10.1109/TAC.2024.3479874

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A variety of practical problems can be modeled by the decision-making process in multiplayer games where a group of self-interested players aim at optimizing their own local objectives, while the objectives depend on the actions taken by others. The local gradient information of each player, essential in implementing algorithms for finding game solutions, is all too often unavailable. In this article, we focus on designing solution algorithms for multiplayer games using bandit feedback, i.e., the only available feedback at each player's disposal is the realized objective values. To tackle the issue of large variances in the existing bandit learning algorithms with a single oracle call, we propose two algorithms by integrating the residual feedback scheme into single-call extragradient methods. Subsequently, we show that the actual sequences of play can converge almost surely to a critical point if the game is pseudomonotone plus and characterize the convergence rate to the critical point when the game is strongly pseudomonotone. The ergodic convergence rates of the generated sequences in monotone games are also investigated as a supplement. Finally, the validity of the proposed algorithms is further verified via numerical examples.

引用

页码：2258 / 2273

页数：16

共 53 条

[1]

[Anonymous], 2016, Found. Trends Optim., DOI DOI 10.1561/2400000013

[2]

Azizian W, 2021, PR MACH LEARN RES, V134, P326

[3]

Ba WJ, 2024, Arxiv, DOI arXiv:2112.02856

[4] Fast generalized Nash equilibrium seeking under partial-decision information [J].

Bianchi, Mattia ;

Belgioioso, Giuseppe ;

Grammatico, Sergio .

AUTOMATICA, 2022, 136

[5]

Bravo M, 2018, ADV NEUR IN, V31

[6]

Bubeck S, 2015, Arxiv, DOI arXiv:1405.4980

[7]

CesaBianchi N., 2006, PREDICTION LEARNING, DOI DOI 10.1017/CBO9780511546921

[8]

Chen X, 2022, PR MACH LEARN RES

[9] ON A STOCHASTIC APPROXIMATION METHOD [J].

CHUNG, KL .

ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (03) :463-483

[10]

Cui SS, 2016, IEEE DECIS CONTR P, P4510, DOI 10.1109/CDC.2016.7798955

← 1 2 3 4 5 6 →