Proximal Algorithms in Statistics and Machine Learning

被引:98
作者
Polson, Nicholas G. [1 ]
Scott, James G. [3 ,4 ]
Willard, Brandon T. [2 ]
机构
[1] Univ Chicago, Econometr & Stat, Chicago, IL 60637 USA
[2] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA
[3] Univ Texas Austin, Stat, McCombs Sch Business, Austin, TX 78712 USA
[4] Univ Texas Austin, Dept Stat & Data Sci, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Bayes MAP; shrinkage; sparsity; splitting; Kurdyka-Lojasiewicz; nonconvex; envelopes; regularization; ADMM; optimization; Divide and Concur; CONVEX-OPTIMIZATION; MAXIMUM-LIKELIHOOD; DESCENT METHODS; EM ALGORITHM; CONVERGENCE; MINIMIZATION; DECOMPOSITION; SHRINKAGE; SELECTION; RECOVERY;
D O I
10.1214/15-STS530
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Proximal algorithms are useful for obtaining solutions to difficult optimization problems, especially those involving nonsmooth or composite objective functions. A proximal algorithm is one whose basic iterations involve the proximal operator of some function, whose evaluation requires solving a specific optimization problem that is typically easier than the original problem. Many familiar algorithms can be cast in this form, and this "proximal view" turns out to provide a set of broad organizing principles for many algorithms useful in statistics and machine learning. In this paper, we show how a number of recent advances in this area can inform modern statistical practice. We focus on several main themes: (1) variable splitting strategies and the augmented Lagrangian; (2) the broad utility of envelope (or variational) representations of objective functions; (3) proximal algorithms for composite objective functions; and (4) the surprisingly large number of functions for which there are closed-form solutions of proximal operators. We illustrate our methodology with regularized Logistic and Poisson regression incorporating a nonconvex bridge penalty and a fused lasso penalty. We also discuss several related issues, including the convergence of nondescent algorithms, acceleration and optimization for nonconvex functions. Finally, we provide directions for future research in this exciting area at the intersection of statistics and optimization.
引用
收藏
页码:559 / 581
页数:23
相关论文
共 73 条
[1]   On global and local convergence of half-quadratic algorithms [J].
Allain, M ;
Idier, J ;
Goussard, Y .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (05) :1130-1142
[2]  
ALLEN-ZHU Z., 2014, PREPRINT
[3]  
[Anonymous], 2013, Foundations and Trends in Optimization
[4]  
[Anonymous], PREPRINT
[5]  
[Anonymous], 1998, VARIATIONAL ANAL VOL
[6]  
[Anonymous], 2005, P 18 INT C NEUR INF
[7]  
[Anonymous], 2010, Convex optimization in signal processing and communications
[8]  
[Anonymous], 2009, ELEMENTS STAT LEARNI, DOI DOI 10.1007/978-0-387-84858-7
[9]  
[Anonymous], 2011, Optimization for Machine Learning
[10]  
ARGYRIOU A., 2011, PREPRINT