Analysis of Testing-Based Forward Model Selection

被引:7
|
作者
Kozbur, Damian [1 ]
机构
[1] Univ Zurich, Dept Econ, Zurich, Switzerland
关键词
Model selection; forward regression; sparsity; hypothesis testing; VARIABLE SELECTION; CONFIDENCE-INTERVALS; LEAST-SQUARES; REGRESSION; HETEROSKEDASTICITY; INFERENCE; LASSO; TIME;
D O I
10.3982/ECTA16273
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper analyzes a procedure called Testing-Based Forward Model Selection (TBFMS) in linear regression problems. This procedure inductively selects covariates that add predictive power into a working statistical model before estimating a final regression. The criterion for deciding which covariate to include next and when to stop including covariates is derived from a profile of traditional statistical hypothesis tests. This paper proves probabilistic bounds, which depend on the quality of the tests, for prediction error and the number of selected covariates. As an example, the bounds are then specialized to a case with heteroscedastic data, with tests constructed with the help of Huber-Eicker-White standard errors. Under the assumed regularity conditions, these tests lead to estimation convergence rates matching other common high-dimensional estimators including Lasso.
引用
收藏
页码:2147 / 2173
页数:27
相关论文
共 50 条
  • [21] Estimation and Accuracy After Model Selection
    Efron, Bradley
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 991 - 1007
  • [22] Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization
    Goren, Emily
    Wang, Chong
    He, Zhulin
    Sheflin, Amy M.
    Chiniquy, Dawn
    Prenni, Jessica E.
    Tringe, Susannah
    Schachtman, Daniel P.
    Liu, Peng
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [23] Sparse Penalized Forward Selection for Support Vector Classification
    Ghosal, Subhashis
    Turnbull, Bradley
    Zhang, Hao Helen
    Hwang, Wook Yeon
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2016, 25 (02) : 493 - 514
  • [24] A novel forward gene selection algorithm for microarray data
    Du, Dajun
    Li, Kang
    Li, Xue
    Fei, Minrui
    NEUROCOMPUTING, 2014, 133 : 446 - 458
  • [25] A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces
    Wei, Hongwei
    Su, Xiaohong
    Gao, Cuiyun
    Zheng, Weining
    Tao, Wenxin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (05)
  • [26] SuRF: A new method for sparse variable selection, with application in microbiome data analysis
    Liu, Lihui
    Gu, Hong
    Van Limbergen, Johan
    Kenney, Toby
    STATISTICS IN MEDICINE, 2021, 40 (04) : 897 - 919
  • [27] Variable selection for high dimensional Gaussian copula regression model: An adaptive hypothesis testing procedure
    He, Yong
    Zhang, Xinsheng
    Zhang, Liwen
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 124 : 132 - 150
  • [28] An iterative model-free feature screening procedure: Forward recursive selection
    Xia, Siwei
    Yang, Yuehan
    KNOWLEDGE-BASED SYSTEMS, 2022, 246
  • [29] Bayesian model selection based on parameter estimates from subsamples
    Zhang, Jingsi
    Jiang, Wenxin
    Shao, Xiaofeng
    STATISTICS & PROBABILITY LETTERS, 2013, 83 (04) : 979 - 986
  • [30] PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION
    Liu, Wei
    Yang, Yuhong
    ANNALS OF STATISTICS, 2011, 39 (04) : 2074 - 2102