STATISTICAL INFERENCE FOR MODEL PARAMETERS IN STOCHASTIC GRADIENT DESCENT

被引:64
作者
Chen, Xi [1 ]
Lee, Jason D. [2 ]
Tong, Xin T. [3 ]
Zhang, Yichen [1 ]
机构
[1] NYU, Stern Sch Business, Dept Technol Operat & Stat, New York, NY 10003 USA
[2] Univ Southern Calif, Data Sci & Operat, Marshall Sch Business, Los Angeles, CA 90007 USA
[3] Natl Univ Singapore, Dept Math, Singapore, Singapore
关键词
Stochastic gradient descent; asymptotic variance; batch-means estimator; high-dimensional inference; time-inhomogeneous Markov chain; CONFIDENCE-INTERVALS; OUTPUT ANALYSIS; VARIANCE; APPROXIMATION; ESTIMATORS; SELECTION;
D O I
10.1214/18-AOS1801
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are twofold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.
引用
收藏
页码:251 / 273
页数:23
相关论文
共 38 条
[1]  
BACH F., 2011, ADV NEURAL INFORM PR
[2]  
BACH F., 2013, P ADV NEURAL INFORM
[3]   Least squares after model selection in high-dimensional sparse models [J].
Belloni, Alexandre ;
Chernozhukov, Victor .
BERNOULLI, 2013, 19 (02) :521-547
[4]   High-dimensional variable screening and bias in subsequent inference, with an empirical comparison [J].
Buehlmann, Peter ;
Mandozzi, Jacopo .
COMPUTATIONAL STATISTICS, 2014, 29 (3-4) :407-430
[5]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[6]   STRONG CONSISTENCY AND OTHER PROPERTIES OF THE SPECTRAL VARIANCE ESTIMATOR [J].
DAMERDJI, H .
MANAGEMENT SCIENCE, 1991, 37 (11) :1424-1440
[7]   ON ASYMPTOTIC NORMALITY IN STOCHASTIC APPROXIMATION [J].
FABIAN, V .
ANNALS OF MATHEMATICAL STATISTICS, 1968, 39 (04) :1327-&
[8]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[9]  
Fishman G.S., 1996, SPRINGER SERIES OPER
[10]   BATCH MEANS AND SPECTRAL VARIANCE ESTIMATORS IN MARKOV CHAIN MONTE CARLO [J].
Flegal, James M. ;
Jones, Galin L. .
ANNALS OF STATISTICS, 2010, 38 (02) :1034-1070