The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

被引：0

作者：

Woo, Jiin ^{[1
]}

Joshi, Gauri ^{[1
]}

Chi, Yuejie ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2025年 / 26卷

关键词：

federated RL; Q-learning; sample complexity; linear speedup; heterogeneity; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning, which exhibit a linear speedup with respect to the number of agents and near-optimal dependencies on other salient problem parameters. In the asynchronous setting, existing analyses of federated Q-learning, which adopt an equally weighted averaging of local Q-estimates, require that every agent covers the entire state-action space. In contrast, our improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents to collectively cover the entire state-action space, unveiling the blessing of heterogeneity. However, its sample complexity still suffers when the local trajectories are highly heterogeneous. In response, we propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs, which achieves a robust linear speedup as if all trajectories are centrally processed, regardless of the heterogeneity of local behavior policies.

引用

页码：1 / 85

页数：85

共 50 条

[1]

Assran M., 2019, ADV NEURAL INFORM PR, P13320

[2]

Bai Y, 2019, ADV NEUR IN, V32

[3] Error bounds for constant step-size Q-learning [J].

Beck, C. L. ;

Srikant, R. .

SYSTEMS & CONTROL LETTERS, 2012, 61 (12) :1203-1208

[4]

Bonawitz K., 2019, Proc. Mach. Learn. Res, V1, P374

[5] The ODE method for convergence of stochastic approximation and reinforcement learning [J].

Borkar, VS ;

Meyn, SP .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469

[6] Communication-Efficient Policy G ad en Methods for Distributed Reinforcement Learning [J].

Chen, Tianyi ;

Zhang, Kaiqing ;

Giannakis, Georgios B. ;

Basar, Tamer .

IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2022, 9 (02) :917-929

[7]

Chen Z., 2022, Transactions on Machine Learning Research

[8]

Chen ZW, 2021, Arxiv, DOI arXiv:2102.01567

[9]

Chen Zaiwei, 2020, Advances in Neural Information Processing Systems, V33, P8223

[10]

Chen Ziyi, 2022, P MACHINE LEARNIN

← 1 2 3 4 5 →