Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization

被引：0

作者：

Gersende Fort

Eric Moulines

机构：

[1] CNRS & Université de Toulouse,Institut de Mathématiques de Toulouse

[2] Ecole Polytechnique,CMAP

来源：

Statistics and Computing | 2023年 / 33卷 / 3期

关键词：

Stochastic optimization; Variable metric forward–backward splitting; Preconditioned stochastic gradient descent; Incremental expectation maximization; Proximal methods; Variance reduction; Non-asymptotic convergence bounds;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward–Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting. We show that 3P-SPIDER extends some Stochastic preconditioned Gradient Descent-based algorithms and some Incremental Expectation Maximization algorithms to composite optimization and to the case the forward operator can not be computed in closed form. We also provide an explicit control of convergence in expectation of 3P-SPIDER, and study its complexity in order to satisfy the approximate epsilon-stationary condition. Our results are the first to combine the non-convex composite optimization setting, a variance reduction technique to tackle the finite sum setting by using a minibatch strategy and, to allow deterministic or random approximations of the preconditioned forward operator. Finally, through an application to inference in a logistic regression model with random effects, we numerically compare 3P-SPIDER to other stochastic forward–backward algorithms and discuss the role of some design parameters of 3P-SPIDER.

引用

共 77 条

[1]

Andrieu C(2015)Quantitative convergence rates for subgeometric Markov chains J. Appl. Probab. 52 391-404

[2]

Fort G(2017)On perturbed proximal gradient algorithms J. Mach. Learn. Res. 18 1-33

[3]

Vihola M(2017)First-order methods in optimization Soc. Ind. Appl. Math. 10 9781611974997-2481

[4]

Atchadé Y(2019)On quasi-newton forward–backward splitting: Proximal calculus and convergence SIAM J. Optim. 29 2445-613

[5]

Fort G(2021)Variable metric techniques for forward–backward methods in imaging J. Comput. Appl. Math. 385 192-82

[6]

Moulines E(1986)Fundamentals of statistical exponential families : with applications in statistical decision theory. Lecture notes-monograph series fundamentals of statistical exponential families. Inst. Math. Stat. 71 593-444

[7]

Beck A(2009)On-line expectation maximization algorithm for latent data models J. Roy. Stat. Soc. B Met. 2 73-2064

[8]

Becker S(1985)The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem Comput. Stat. Q. 7 421-132

[9]

Fadili J(1997)Convergence rates in forward-backward splitting SIAM J. Optim. 7 2054-1318

[10]

Ochs P(2013)The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic Electron. J. Stat. 162 107-1200

← 1 2 3 4 5 6 7 8 →