Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

被引：2

作者：

Shi Junru ^{[1
]}

Wang Qiong ^{[2
]}

Liu Muhua ^{[1
]}

Ji Zhihang ^{[1
]}

Zheng Ruijuan ^{[1
]}

Wu Qingtao ^{[1
]}

机构：

[1] Henan Univ Sci & Technol, Sch Informat Engn, Luoyang 471023, Henan, Peoples R China

[2] Chinese Acad Social Sci, Inst Ind Econ, Beijing 100732, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Distributed multi-task setting; Momentum; Policy gradient;

D O I：

10.1007/s10489-022-04028-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To find the optimal policy quickly for reinforcement learning problems, policy gradient (PG) method is very effective, it parameters the policy and updates policy parameter directly. Besides, momentum methods are commonly employed to improve convergence performance in the training of centralized deep networks, which can accelerate training rate by changing the descending direction of gradients. However, decentralized variants with momentum of PG are rarely investigated. For this reason, we propose a Decentralized Policy Gradient algorithm with Momentum called DPGM for solving multi-task reinforcement learning problems. Moreover, this article makes theoretical analysis on the convergence performance of DPGM rigorously, it can reach the rate of O(1/T), where T denotes the number of iterations. This rate can match the state of the art of decentralized PG methods. Furthermore, we provide experimental verification on decentralized reinforcement learning environment to support the theoretical result.

引用

页码：10365 / 10379

页数：15

共 37 条

[1]

Agarwal A., 2020, C LEARN THEOR, P64

[2]

Andreas J, 2017, PR MACH LEARN RES, V70

[3] Infinite-horizon policy-gradient estimation [J].

Baxter, J ;

Bartlett, PL .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350

[4] Cooperative Multi-agent Policy Gradient [J].

Bono, Guillaume ;

Dibangoye, Jilles Steeve ;

Matignon, Laetitia ;

Pereyron, Florian ;

Simonin, Olivier .

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 :459-476

[5] Modeling semantic and emotional relationship in multi-turn emotional conversations using multi-task learning [J].

Cui, Fuwei ;

Di, Hui ;

Shen, Lei ;

Ouchi, Kazushige ;

Liu, Ze ;

Xu, Jinan .

APPLIED INTELLIGENCE, 2022, 52 (04) :4663-4673

[6]

Cutkosky A, 2019, ADV NEUR IN, P15210

[7]

DEramo C., 2020, INT C LEARN REPR

[8] A guide to deep learning in healthcare [J].

Esteva, Andre ;

Robicquet, Alexandre ;

Ramsundar, Bharath ;

Kuleshov, Volodymyr ;

DePristo, Mark ;

Chou, Katherine ;

Cui, Claire ;

Corrado, Greg ;

Thrun, Sebastian ;

Dean, Jeff .

NATURE MEDICINE, 2019, 25 (01) :24-29

[9]

Fazel M, 2018, PR MACH LEARN RES, V80

[10]

Foerster JN, 2016, ADV NEUR IN, V29

← 1 2 3 4 →