Personalized federated reinforcement learning: Balancing personalization and via distance constraint

被引：6

作者：

Xiong, Weicheng ^{[1
]}

Liu, Quan ^{[1
]}

Li, Fanzhang ^{[1
]}

Wang, Bangjun ^{[1
]}

Zhu, Fei ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 238卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Federated learning; Personalization; Regularization; Distance constraint; Experience sharing; LEVEL;

D O I：

10.1016/j.eswa.2023.122290

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional federated reinforcement learning methods aim to find an optimal global policy for all agents. However, due to the heterogeneity of the environment, the optimal global policy is often only a suboptimal solution. To resolve this problem, we propose a personalized federated reinforcement learning method, named perFedDC, which aims to establish an optimal personalized policy for each agent. Our method involves creating a global model and multiple local models, using the I2-norm to measure the distance between the global model and the local model. We introduce a distance constraint as a regularization term in the update of the local model to prevent excessive policy updates. While the distance constraint can facilitate experience sharing, it is important to strike a balance between personalization and sharing appropriately. As much as possible, agents benefit from the advantages of shared experience while developing personalization. The experiments demonstrated that perFedDC was able to accelerate agent training in a stable manner while still maintaining the privacy constraints of federated learning. Furthermore, newly added agents to the federated system were able to quickly develop effective policies with the aid of convergent global policies.

引用

页数：10

共 39 条

[1] Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles [J].

Aradi, Szilard .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :740-759

[2] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[3]

Bertsekas D.P., 2012, DYNAMIC PROGRAMMING, VII

[4]

Brockman Greg, 2016, arXiv

[5]

Collins L, 2021, PR MACH LEARN RES, V139

[6]

Dinh C. T., 34 C NEUR INF PROC S, P21394

[7] Energy-Efficient Hierarchical Resource Allocation in Uplink-Downlink Decoupled NOMA HetNets [J].

Dong, Shaofeng ;

Zhan, Jinsong ;

Hu, Wei ;

Mohajer, Amin ;

Bavaghar, Maryam ;

Mirzaei, Abbas .

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (03) :3380-3395

[8]

Doshi-Velez Finale, 2016, IJCAI (U S), V2016, P1432

[9]

Finn C, 2017, PR MACH LEARN RES, V70

[10] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

← 1 2 3 4 →