Proximal algorithms and temporal difference methods for solving fixed point problems

被引：0

作者：

Dimitri P. Bertsekas

机构：

[1] M.I.T.,Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science

来源：

Computational Optimization and Applications | 2018年 / 70卷

关键词：

Proximal algorithm; Temporal differences; Dynamic programming; Convex optimization; Fixed point problems;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper we consider large fixed point problems and solution with proximal algorithms. We show that for linear problems there is a close connection between proximal iterations, which are prominent in numerical analysis and optimization, and multistep methods of the temporal difference type such as TD(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), LSTD(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), and LSPE(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), which are central in simulation-based exact and approximate dynamic programming. One benefit of this connection is a new and simple way to accelerate the standard proximal algorithm by extrapolation towards a multistep iteration, which generically has a faster convergence rate. Another benefit is the potential for integration into the proximal algorithmic context of several new ideas that have emerged in the approximate dynamic programming context, including simulation-based implementations. Conversely, the analytical and algorithmic insights from proximal algorithms can be brought to bear on the analysis and the enhancement of temporal difference methods. We also generalize our linear case result to nonlinear problems that involve a contractive mapping, thus providing guaranteed and potentially substantial acceleration of the proximal and forward backward splitting algorithms at no extra cost. Moreover, under certain monotonicity assumptions, we extend the connection with temporal difference methods to nonlinear problems through a linearization approach.

引用

页码：709 / 736

页数：27

共 87 条

[41]

Lewis AS(2010)Convergence results for some temporal difference methods based on least squares Math. Oper. Res. 35 306-329

[42]

Lions PL(2013)Error bounds for approximations from projected linear equations Ann. Oper. Res. 208 95-132

[43]

Mercier B(2012)Q-learning and policy iteration algorithms for stochastic shortest path problems SIAM J. Control Optim. 50 3310-3343

[44]

Mnih V(undefined)Least squares temporal difference methods: an analysis under general conditions undefined undefined undefined-undefined

[45]

Kavukcuoglu K(undefined)undefined undefined undefined undefined-undefined

[46]

Silver D(undefined)undefined undefined undefined undefined-undefined

[47]

Martinet B(undefined)undefined undefined undefined undefined-undefined

[48]

Nedić A(undefined)undefined undefined undefined undefined-undefined

[49]

Bertsekas DP(undefined)undefined undefined undefined undefined-undefined

[50]

Parikh N(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 5 6 7 8 9 →