共 30 条
[11]
Kakade S, 2001, LECT NOTES ARTIF INT, V2111, P605
[12]
Kakade Sham, 2002, P 19 INT C MACHINE L, P267
[13]
Konda VR, 2000, ADV NEUR IN, V12, P1008
[14]
Lowe R, 2017, ADV NEUR IN, V30
[16]
Approximate gradient methods in policy-space optimization of Markov reward processes
[J].
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS,
2003, 13 (1-2)
:111-148
[17]
Marcin Andrychowicz, 2020, ARXIV200605990
[18]
Puterman M.L., 1994, Markov decision processes: discrete stochastic dynamic programming
[19]
Schulman, 2017, ARXIV
[20]
Schulman J, 2015, PR MACH LEARN RES, V37, P1889