IPAD: Stable Interpretable Forecasting with Knockoffs Inference

被引:30
作者
Fan, Yingying [1 ]
Lv, Jinchi [1 ]
Sharifvaghefi, Mahrad [2 ]
Uematsu, Yoshimasa [1 ]
机构
[1] Univ Southern Calif, Data Sci & Operat Dept, Los Angeles, CA 90007 USA
[2] Univ Southern Calif, Dept Econ, Los Angeles, CA 90007 USA
基金
日本学术振兴会; 美国国家卫生研究院; 美国国家科学基金会;
关键词
Large-scale inference and FDR; Latent factors; Model-X knockoffs; Power; Reproducibility; Stability; FALSE DISCOVERY RATE; SHRINKAGE; NUMBER; SELECTION;
D O I
10.1080/01621459.2019.1654878
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Interpretability and stability are two important features that are desired in many contemporary big data applications arising in statistics, economics, and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped. To this end, in this article, we exploit the general framework of model-X knockoffs introduced recently in Candes, Fan, Janson and Lv [(2018), "Panning for Gold: 'model X' Knockoffs for High Dimensional Controlled Variable Selection," Journal of the Royal Statistical Society, Series B, 80, 551-577], which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in which we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods. for this article are available online.
引用
收藏
页码:1822 / 1834
页数:13
相关论文
共 36 条
[1]   Eigenvalue Ratio Test for the Number of Factors [J].
Ahn, Seung C. ;
Horenstein, Alex R. .
ECONOMETRICA, 2013, 81 (03) :1203-1227
[2]   Inferential theory for factor models of large dimensions. [J].
Bai, J .
ECONOMETRICA, 2003, 71 (01) :135-171
[3]   Determining the number of factors in approximate factor models [J].
Bai, JS ;
Ng, S .
ECONOMETRICA, 2002, 70 (01) :191-221
[4]  
Barber R. F., 2016, The Annals of Statistics
[5]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[6]  
Belloni A., 2018, ARXIV180601888
[7]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]   Discovering the false discovery rate [J].
Benjamini, Yoav .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :405-416
[10]  
Billingsley P., 1979, Probability and measure