Statistical learning and selective inference

被引:238
作者
Taylor, Jonathan [1 ]
Tibshirani, Robert J. [2 ,3 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
inference; P values; lasso;
D O I
10.1073/pnas.1507583112
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"-searched for the strongest associations-means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
引用
收藏
页码:7629 / 7634
页数:6
相关论文
共 20 条
[1]  
[Anonymous], 2014, ARXIV14053920
[2]  
[Anonymous], 2007, ACM SIGKDD EXPLORATI
[3]  
[Anonymous], 2013, ARXIV13074765
[4]  
[Anonymous], 2014, arXiv preprint arXiv:1410.2597
[5]  
[Anonymous], 2019, Statistical learning with sparsity: the lasso and generalizations
[6]   VALID POST-SELECTION INFERENCE [J].
Berk, Richard ;
Brown, Lawrence ;
Buja, Andreas ;
Zhang, Kai ;
Zhao, Linda .
ANNALS OF STATISTICS, 2013, 41 (02) :802-837
[7]  
Candes E., 2006, P INT C MATH, VIII
[8]  
Choi Y., 2014, ARXIV14108260
[9]   For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution [J].
Donoho, DL .
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (06) :797-829
[10]  
G'Sell M. G., 2013, ARXIV13095352