Power of the spacing test for least-angle regression

被引:7
作者
Azais, Jean-Marc [1 ]
De Castro, Yohann [2 ]
Mourareau, Stephane [1 ]
机构
[1] Univ Paul Sabatier, Inst Math Toulouse, Route Narbonne, F-31062 Toulouse, France
[2] Univ Paris Saclay, Univ Paris Sud, Lab Math Orsay, CNRS, F-91405 Orsay, France
关键词
hypothesis testing; L-1-minimization; power; spacing test; DANTZIG SELECTOR; LASSO; RECOVERY;
D O I
10.3150/16-BEJ885
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Recent advances in Post-Selection Inference have shown that conditional testing is relevant and tractable in high-dimensions. In the Gaussian linear model, further works have derived unconditional test statistics such as the Kac-Rice Pivot for general penalized problems. In order to test the global null, a prominent offspring of this breakthrough is the Spacing test that accounts the relative separation between the first two knots of the celebrated least-angle regression (LARS) algorithm. However, no results have been shown regarding the distribution of these test statistics under the alternative. For the first time, this paper addresses this important issue for the Spacing test and shows that it is unconditionally unbiased. Furthermore, we provide the first extension of the Spacing test to the frame of unknown noise variance. More precisely, we investigate the power of the Spacing test for LARS and prove that it is unbiased: its power is always greater or equal to the significance level a. In particular, we describe the power of this test under various scenarii: we prove that its rejection region is optimal when the predictors are orthogonal; as the level a goes to zero, we show that the probability of getting a true positive is much greater than a; and we give a detailed description of its power in the case of two predictors. Moreover, we numerically investigate a comparison between the Spacing test for LARS, the Pearson's chi-squared test (goodness of fit) and a numerical testing procedure based on the maximal correlation. When the noise variance is unknown, our analysis unleashes a new test statistic that can be computed in cubic time in the population size and which we refer to as the t-Spacing test for LARS. The t-Spacing test involves the first two knots of the LARS algorithm and we give its distribution under the null hypothesis. Interestingly, numerical experiments witness that the t-Spacing test for LARS enjoys the same aforementioned properties as the Spacing test.
引用
收藏
页码:465 / 492
页数:28
相关论文
共 26 条
[1]  
Anderson T. W., 1955, Proc. Amer. Math. Soc., V6, P170
[2]  
[Anonymous], 2015, arXiv preprint arXiv:1511.01478
[3]  
[Anonymous], 2019, Statistical learning with sparsity: the lasso and generalizations
[4]  
[Anonymous], 2013, ARXIV13116238
[5]  
AZAIS J.-M., 2009, Level Sets and Extrema of Random Processes and Fields, DOI DOI 10.1002/9780470434642
[6]   Computation of the Distribution of the Maximum of Stationary Gaussian Processes [J].
Azais, Jean-Marc ;
Genz, Alan .
METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2013, 15 (04) :969-985
[7]   Adaptive Dantzig density estimation [J].
Bertin, K. ;
Le Pennec, E. ;
Rivoirard, V. .
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 2011, 47 (01) :43-74
[8]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[9]  
Biihlmann P., 2011, SPRINGER SERIES STAT
[10]  
Bühlmann P, 2014, ANN STAT, V42, P469, DOI 10.1214/13-AOS1175A