COMPUTATIONAL AND STATISTICAL BOUNDARIES FOR SUBMATRIX LOCALIZATION IN A LARGE NOISY MATRIX

被引:31
作者
Cai, T. Tony [1 ]
Liang, Tengyuan [1 ]
Rakhlin, Alexander [1 ]
机构
[1] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
关键词
Computational boundary; computational complexity; detection; planted clique; lower bounds; minimax; signal-to-noise ratio; statistical boundary; submatrix localization; LOW-RANK APPROXIMATION; PROBABILITY-INEQUALITIES; OPTIMAL RATES; SPARSE PCA; ALGORITHMS;
D O I
10.1214/16-AOS1488
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study in this paper computational and statistical boundaries for submatrix localization. Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude. and size k(m) x k(n)) embedded in a large noise matrix (of size mxn), the goal is to optimal identify the support of the signal submatrix computationally and statistically. Two transition thresholds for the signal-to-noise ratio lambda/ sigma are established in terms of m, n, k(m) and k(n). The first threshold, SNRc, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold SNRc. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis. The second threshold, SNRs, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that SNRc is always significantly larger than SNRs under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.
引用
收藏
页码:1403 / 1430
页数:28
相关论文
共 46 条
[1]   NOISY MATRIX DECOMPOSITION VIA CONVEX RELAXATION: OPTIMAL RATES IN HIGH DIMENSIONS [J].
Agarwal, Alekh ;
Negahban, Sahand ;
Wainwright, Martin J. .
ANNALS OF STATISTICS, 2012, 40 (02) :1171-1197
[2]  
[Anonymous], PREPRINT
[3]  
[Anonymous], 2014, PREPRINT
[4]   DETECTION OF AN ANOMALOUS CLUSTER IN A NETWORK [J].
Arias-Castro, Ery ;
Candes, Emmanuel J. ;
Durand, Arnaud .
ANNALS OF STATISTICS, 2011, 39 (01) :278-304
[5]  
Balakrishnan S., 2011, NIPS 2011 WORKSHOP C
[7]  
Berthet Q., 2013, PREPRINT
[8]   OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS IN HIGH DIMENSION [J].
Berthet, Quentin ;
Rigollet, Philippe .
ANNALS OF STATISTICS, 2013, 41 (04) :1780-1815
[9]   MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA [J].
Birnbaum, Aharon ;
Johnstone, Iain M. ;
Nadler, Boaz ;
Paul, Debashis .
ANNALS OF STATISTICS, 2013, 41 (03) :1055-1084
[10]  
BUTUCEA C., 2013, PREPRINT