Hierarchical Testing in the High-Dimensional Setting With Correlated Variables

被引:17
作者
Mandozzi, Jacopo [1 ]
Buhlmann, Peter [2 ]
机构
[1] Libera AG, Stockerstr 34, CH-8022 Zurich, Switzerland
[2] ETH, Dept Math, Stat, CH-8092 Zurich, Switzerland
关键词
Familywise error rate; Hierarchical clustering; High-dimensional variable selection; Lassol; Linear model; Minimal true detection; Multiple testing; Sample splitting; CONFIDENCE-INTERVALS; REGRESSION; LASSO; DISCOVERY; SELECTION;
D O I
10.1080/01621459.2015.1007209
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a method for testing whether hierarchically ordered groups of potentially correlated variables are significant for explaining a response in a high-dimensional linear model. In presence of highly correlated variables, as is very common in high-dimensional data, it seems indispensable to go beyond an approach of inferring individual regression coefficients, and we show that detecting smallest groups of variables (MTDs: minimal true detections) is realistic. Thanks to the hierarchy among the groups of variables, powerful multiple testing adjustment is possible which leads to a data-driven choice of the resolution level for the groups. Our procedure, based on repeated sample splitting, is shown to asymptotically control the familywise error rate and we provide empirical results for simulated and real data which complement the theoretical analysis. Supplementary materials for this article are available online.
引用
收藏
页码:331 / 343
页数:13
相关论文
共 25 条