SHARP VARIABLE SELECTION OF A SPARSE SUBMATRIX IN A HIGH-DIMENSIONAL NOISY MATRIX

被引：14

作者：

Butucea, Cristina ^{[1
,2
]}

Ingster, Yuri I.

Suslina, Irina A. ^{[3
]}

机构：

[1] Univ Paris Est, CNRS, UPEMLV, LAMA,UMR 8050,UPEC, F-77454 Marne La Vallee, France

[2] CREST, F-92240 Malakoff, France

[3] St Petersburg Natl Res Univ Informat Technol Mech, St Petersburg 197101, Russia

来源：

ESAIM-PROBABILITY AND STATISTICS | 2015年 / 19卷

关键词：

Estimation; minimax testing; large matrices; selection of sparse signal; sharp selection bounds; variable selection; LARGE-AVERAGE;

D O I：

10.1051/ps/2014017

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We observe a N x M matrix of independent, identically distributed Gaussian random variables which are centered except for elements of some submatrix of size n x m where the mean is larger than some a > 0. The submatrix is sparse in the sense that n/N and m/M tend to 0, whereas n, m, N and M tend to infinity. We consider the problem of selecting the random variables with significantly large mean values, as was also considered by [M. Kolar, S. Balakrishnan, A Rinaldo and A. Singh, NIPS (2011)]. We give sufficient conditions on a as a function of n, m, N and M and construct a uniformly consistent procedure in order to do sharp variable selection. We also prove the minimax lower bounds under necessary conditions which are complementary to the previous conditions. The critical values a* separating the necessary and sufficient conditions are sharp (we show exact constants), whereas [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)] only prove rate optimality and focus on suboptimal computationally feasible selectors. Note that rate optimality in this problem leaves out a large set of possible parameters, where we do not know whether consistent selection is possible.

引用

页码：115 / 134

页数：20

共 32 条

[1] Adapting to unknown sparsity by controlling the false discovery rate
Abramovich, Felix
Benjamini, Yoav
Donoho, David L.
Johnstone, Iain M.
[J]. ANNALS OF STATISTICS, 2006, 34 (02) : 584 - 653
[2] [Anonymous], 2003, LECT NOTES STAT
[3] Near-optimal detection of geometric objects by fast multiscale methods
Arias-Castro, E
Donoho, DL
Huo, XM
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (07) : 2402 - 2425
[4] ARIAS-CASTRO E., 2010, ARXIV10071434
[5] Arias-Castro E., 2012, ARXIV12082635
[6] DETECTION OF AN ANOMALOUS CLUSTER IN A NETWORK
Arias-Castro, Ery
Candes, Emmanuel J.
Durand, Arnaud
[J]. ANNALS OF STATISTICS, 2011, 39 (01) : 278 - 304
[7] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
BENJAMINI, Y
HOCHBERG, Y
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
[8] Selection of variables and dimension reduction in high-dimensional non-parametric regression
Bertin, Karine
Lecue, Guillaume
[J]. ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 : 1224 - 1241
[9] SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
Bickel, Peter J.
Ritov, Ya'acov
Tsybakov, Alexandre B.
[J]. ANNALS OF STATISTICS, 2009, 37 (04) : 1705 - 1732
[10] Butucea C., 2013, ARXIV13014660

← 1 2 3 4 →