OPTIMAL SELECTION OF REDUCED RANK ESTIMATORS OF HIGH-DIMENSIONAL MATRICES

被引:155
作者
Bunea, Florentina [1 ]
She, Yiyuan [1 ]
Wegkamp, Marten H. [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
Multivariate response regression; reduced rank estimators; dimension reduction; rank selection; adaptive estimation; oracle inequalities; nuclear norm; low rank matrix approximation; REGRESSION;
D O I
10.1214/11-AOS876
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when n, the number of responses, and p, the number of predictors, stay bounded, and m, the number of observations, grows, but also when either, or both, n and p grow, possibly much faster than m. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation. We verify and illustrate our theoretical findings via an extensive simulation study.
引用
收藏
页码:1282 / 1309
页数:28
相关论文
共 26 条
[1]  
Anderson T.W., 2002, SANKHYA A, V64, P193
[2]   ESTIMATING LINEAR RESTRICTIONS ON REGRESSION COEFFICIENTS FOR MULTIVARIATE NORMAL DISTRIBUTIONS [J].
ANDERSON, TW .
ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (03) :327-351
[3]  
Anderson TW, 1999, ANN STAT, V27, P1141
[4]  
[Anonymous], 2002, THESIS STANFORD U
[5]  
CANDES E. J., 2010, TIGHT ORACLE BOUNDS
[6]   The Power of Convex Relaxation: Near-Optimal Matrix Completion [J].
Candes, Emmanuel J. ;
Tao, Terence .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (05) :2053-2080
[7]  
Cavalier L, 2002, ANN STAT, V30, P843
[8]  
Horn R.A., 2012, Matrix Analysis
[9]  
Izenman A. J., 1975, Journal of Multivariate Analysis, V5, P248, DOI 10.1016/0047-259X(75)90042-1
[10]  
Izenman AJ, 2008, SPRINGER TEXTS STAT, P1, DOI 10.1007/978-0-387-78189-1_1