Multiagent value iteration algorithms in dynamic programming and reinforcement learning

被引：15

作者：

Bertsekas, Dimitri ^{[1
,2
]}

机构：

[1] MIT, Cambridge, MA USA

[2] ASU, Computat Decis Makingg, Tempe, AZ 85287 USA

来源：

RESULTS IN CONTROL AND OPTIMIZATION | 2020年 / 1卷

关键词：

BY-PERSON OPTIMIZATION; STATIC TEAM PROBLEMS; COMMON INFORMATION;

D O I：

10.1016/j.rico.2020.100003

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one -agent -at -a -time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all -agents -at -once method. For the case of a finite -state discounted problem, we showed convergence to an agent -by -agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.

引用

页数：10

共 30 条

[11]

Bertsekas DP., 2016, Nonlinear Programming, V3

[12]

Bertsekas DP, 2020, IEEE/CAA J Autom Sin

[13]

Bertsekas DP, 2010, Williams-baird counterexample for Q-factor asynchronous policy iteration

[14] A class of team problems with discrete action spaces: Optimality conditions based on multimodularity [J].

De Waal, PR ;

Schuppen, JHA .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (03) :875-892

[15] EXISTENCE OF TEAM-OPTIMAL SOLUTIONS IN STATIC TEAMS WITH COMMON INFORMATION: A TOPOLOGY OF INFORMATION APPROACH [J].

Gupta, Abhishek .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (02) :998-1021

[16] TEAM DECISION-THEORY AND INFORMATION STRUCTURES [J].

HO, YC .

PROCEEDINGS OF THE IEEE, 1980, 68 (06) :644-654

[17] STATIC TEAM PROBLEMS .1. SUFFICIENT CONDITIONS AND THE EXPONENTIAL COST CRITERION [J].

KRAINAK, JC ;

SPEYER, JL ;

MARCUS, SI .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1982, 27 (04) :839-848

[18] STATIC TEAM PROBLEMS .2. AFFINE CONTROL LAWS, PROJECTIONS, ALGORITHMS, AND THE LEGT PROBLEM [J].

KRAINAK, JC ;

SPEYER, JL ;

MARCUS, SI .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1982, 27 (04) :848-859

[19]

Li YY, 2020, Arxiv, DOI arXiv:1912.09135

[20] ELEMENTS FOR A THEORY OF TEAMS [J].

Marschak, J. .

MANAGEMENT SCIENCE, 1955, 1 (02) :127-137

← 1 2 3 →