Proximal algorithms and temporal difference methods for solving fixed point problems

被引：0

作者：

Dimitri P. Bertsekas

机构：

[1] M.I.T.,Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science

来源：

Computational Optimization and Applications | 2018年 / 70卷

关键词：

Proximal algorithm; Temporal differences; Dynamic programming; Convex optimization; Fixed point problems;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper we consider large fixed point problems and solution with proximal algorithms. We show that for linear problems there is a close connection between proximal iterations, which are prominent in numerical analysis and optimization, and multistep methods of the temporal difference type such as TD(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), LSTD(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), and LSPE(λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document}), which are central in simulation-based exact and approximate dynamic programming. One benefit of this connection is a new and simple way to accelerate the standard proximal algorithm by extrapolation towards a multistep iteration, which generically has a faster convergence rate. Another benefit is the potential for integration into the proximal algorithmic context of several new ideas that have emerged in the approximate dynamic programming context, including simulation-based implementations. Conversely, the analytical and algorithmic insights from proximal algorithms can be brought to bear on the analysis and the enhancement of temporal difference methods. We also generalize our linear case result to nonlinear problems that involve a contractive mapping, thus providing guaranteed and potentially substantial acceleration of the proximal and forward backward splitting algorithms at no extra cost. Moreover, under certain monotonicity assumptions, we extend the connection with temporal difference methods to nonlinear problems through a linearization approach.

引用

页码：709 / 736

页数：27

共 87 条

[1] Boutsidis C(2014)Near-optimal column-based matrix reconstruction SIAM J. Comput. 43 687-717
[2] Drineas P(1991)An analysis of stochastic shortest path problems Math. OR 16 580-595
[3] Magdon-Ismail M(2009)Projected equation methods for approximate solution of large linear systems J. Comput. Appl. Math. 227 27-50
[4] Bertsekas DP(2012)Q-learning and enhanced policy iteration in discounted dynamic programming Math. OR 37 66-94
[5] Tsitsiklis JN(1975)On the method of multipliers for convex programming IEEE Trans. Auton. Control 20 385-388
[6] Bertsekas DP(2011)Temporal difference methods for general projected equations IEEE Trans. Autom. Control 56 2128-2139
[7] Yu H(2011)Approximate policy iteration: a survey and some new methods J. Control Theory Appl. 9 310-335
[8] Bertsekas DP(2002)Technical update: least-squares temporal difference learning Mach. Learn. 49 1-15
[9] Yu H(1996)Linear least-squares algorithms for temporal difference learning Mach. Learn. 22 33-57
[10] Bertsekas DP(2009)A note on the behavior of the randomized Kaczmarz algorithm of Strohmer and Vershynin J. Fourier Anal. Appl. 15 431-436

← 1 2 3 4 5 6 7 8 9 →