Stochastic optimization algorithm with variance reduction for solving non-smooth problems

被引：0

作者：

Zhu, Xiao-Hui ^{[1
]}

Tao, Qing ^{[1
]}

Shao, Yan-Jian ^{[1
]}

Chu, De-Jun ^{[1
]}

机构：

[1] 1st Department, Army Officer Academy of PLA, Hefei

来源：

Ruan Jian Xue Bao/Journal of Software | 2015年 / 26卷 / 11期

关键词：

Composite objective mirror descent (COMID); Machine learning; Non-smooth; Stochastic algorithm; Variance;

D O I：

10.13328/j.cnki.jos.004890

中图分类号：

学科分类号：

摘要：

Stochastic optimization is one of the efficient methods for solving large-scale machine learning problems. In stochastic learning, the full gradient is replaced by the gradient of loss function in terms of a randomly selected single sample to avoid high computational cost. However, large variance is usually caused. Recent studies have shown that the convergence rate of stochastic gradient methods can be effectively improved by reducing the variance. In this paper, the variance reduction issue in COMID (composite objective mirror descent) is addressed when solving non-smooth optimization problems. First a proof is provided to show that the COMID has a convergence rate O(1/√T+σ2/√T) in the terms of variance, where T is the iteration number and σ2 is the variance. This convergence rate ensures the effectiveness of reducing the variance. Then, a stochastic optimization algorithm α-MDVR (mirror descent with variance reduction) is obtained by incorporating the strategy of reducing variance into COMID. Unlike Prox-SVRG (proximal stochastic variance reduced gradient), α-MDVR does not depend on the number of samples and only uses a small portion of samples to modify the gradient. The comparative experiments demonstrate that α-MDVR not only reduces the variance but also decreases the computational time. © Copyright 2015, Institute of Software, the Chinese Academy of Sciences. All rights reserved.

引用

页码：2752 / 2761

页数：9

共 19 条

[1]

Tseng P., Approximation accuracy, gradient methods, and error bound for structured convex optimization, Mathematical Programming, 125, 2, pp. 263-295, (2010)

[2]

Nemirovski A., Juditsky A., Lan G., Shapiro A., Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19, 4, pp. 1574-1609, (2009)

[3]

Shalev-Shwartz S., Tewari A., Stochastic methods for L1 regularized loss minimization, Proc. of the 26th Annual Int'l Conf. on Machine Learning, pp. 929-936, (2009)

[4]

Johnson R., Zhang T., Accelerating stochastic gradient descent using predictive variance reduction, Proc. of the Advances in Neural Information Processing Systems, 26, pp. 315-323, (2013)

[5]

Shalev-Shwartz S., Zhang T., Stochastic dual coordinate ascent methods for regularized loss minimization, (2012)

[6]

Le Roux N., Schmidt M., Bach F., A stochastic gradient method with an exponential convergence rate for strongly convex optimization with finite training sets, (2012)

[7]

Xiao L., Dual averaging methods for regularized stochastic learning and online optimization, Advances in Neural Information Processing Systems, pp. 2116-2124, (2009)

[8]

Xiao L., Zhang T., A proximal stochastic gradient method with progressive variance reduction, (2014)

[9]

Duchi J., Shalev-Shwartz S., Singer Y., Tewari A., Composite objective mirror descent, Proc. of the 23rd Annual Workshop on Computational Learning Theory, pp. 116-128, (2010)

[10]

Duchi J., Shalev-Shwartz S., Singer Y., Efficient projections onto the L1-ball for learning in high dimensions, Proc. of the 25th Int'l Conf. on Machine Learning, pp. 272-279, (2008)

← 1 2 →