The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

被引:0
作者
Woo, Jiin [1 ]
Joshi, Gauri [1 ]
Chi, Yuejie [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
关键词
federated RL; Q-learning; sample complexity; linear speedup; heterogeneity; STOCHASTIC-APPROXIMATION; CONVERGENCE; RATES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning, which exhibit a linear speedup with respect to the number of agents and near-optimal dependencies on other salient problem parameters. In the asynchronous setting, existing analyses of federated Q-learning, which adopt an equally weighted averaging of local Q-estimates, require that every agent covers the entire state-action space. In contrast, our improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents to collectively cover the entire state-action space, unveiling the blessing of heterogeneity. However, its sample complexity still suffers when the local trajectories are highly heterogeneous. In response, we propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs, which achieves a robust linear speedup as if all trajectories are centrally processed, regardless of the heterogeneity of local behavior policies.
引用
收藏
页码:1 / 85
页数:85
相关论文
共 50 条
[31]  
Qu G, 2020, PR MACH LEARN RES, V125
[32]  
Salgia S., 2024, 38 ANN C NEUR INF PR
[33]  
Shen H, 2022, Arxiv, DOI arXiv:2012.15511
[34]  
Shi Laixi, 2022, P MACHINE LEARNING R
[35]  
Sun J, 2020, PR MACH LEARN RES, V108
[36]  
Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[37]  
Szepesvari C, 1998, ADV NEUR IN, V10, P1064
[38]   ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING [J].
TSITSIKLIS, JN .
MACHINE LEARNING, 1994, 16 (03) :185-202
[39]  
Wai HT, 2020, IEEE DECIS CONTR P, P4897, DOI [10.1109/CDC42340.2020.9304466, 10.1109/cdc42340.2020.9304466]
[40]  
Wainwright M. J., 2019, arXiv