Algorithms for Subset Selection in Linear Regression

被引:0
|
作者
Das, Abhimanyu [1 ]
Kempe, David [1 ]
机构
[1] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
来源
STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING | 2008年
关键词
Algorithms; Theory; VARIABLES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the problem of selecting a subset of k random variables to observe that will yield the best linear prediction of another variable of interest, given the pairwise correlations between the observation variables and the predictor variable. Under approximation preserving reductions, this problem is also equivalent to the "sparse approximation" problem of approximating signals concisely. We propose and analyze exact and approximation algorithms for several special cases of practical interest. We give an FPTAS when the covariance matrix has constant bandwidth, and exact algorithms when the associated covariance graph, consisting of edges for pairs of variables with nor)zero correlation, forms a tree or has a large (known) independent set. Furthermore, we give all exact algorithm when the variables can be embedded into a line such that the covariance decreases exponentially in the distance, and a constant-factor approximation when the variables have no "conditional suppressor variables". Much of our reasoning is based on perturbation results for the R-2 multiple correlation measure, frequently used as a measure for "goodness-of-fit statistics". It lies at the core of our FPTAS, and also allows us to extend exact algorithms to approximation algorithms when the matrix "nearly" falls into one of the above classes. We also use perturbation analysis to prove approximation guarantees For the widely used "Forward Regression" heuristic when the observation variables are nearly independent.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
  • [1] Group subset selection for linear regression
    Guo, Yi
    Berman, Mark
    Gao, Junbin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 75 : 39 - 52
  • [2] Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms
    Orkcu, H. Hasan
    APPLIED MATHEMATICS AND COMPUTATION, 2013, 219 (23) : 11018 - 11028
  • [3] Subset selection for multiple linear regression via optimization
    Park, Young Woong
    Klabjan, Diego
    JOURNAL OF GLOBAL OPTIMIZATION, 2020, 77 (03) : 543 - 574
  • [4] Subset selection for multiple linear regression via optimization
    Young Woong Park
    Diego Klabjan
    Journal of Global Optimization, 2020, 77 : 543 - 574
  • [5] Algorithms for robust model selection in linear regression
    Morgenthaler, S
    Welsch, RE
    Zenide, A
    THEORY AND APPLICATION OF RECENT ROBUST METHODS, 2004, : 195 - 206
  • [6] Subset selection in linear regression using generalized ridge estimator
    Dorugade A.V.
    Kashid D.N.
    Journal of Statistical Theory and Practice, 2010, 4 (3) : 375 - 389
  • [7] Subset selection in multiple linear regression in the presence of outlier and multicollinearity
    Jadhav, Nileshkumar H.
    Kashid, Dattatraya N.
    Kulkarni, Subhash R.
    STATISTICAL METHODOLOGY, 2014, 19 : 44 - 59
  • [8] A more general criterion for subset selection in multiple linear regression
    Kashid, DN
    Kulkarni, SR
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2002, 31 (05) : 795 - 811
  • [9] Carousel Greedy Algorithms for Feature Selection in Linear Regression
    Wang, Jiaqi
    Golden, Bruce
    Cerrone, Carmine
    ALGORITHMS, 2023, 16 (09)
  • [10] lmSubsets: Exact Variable-Subset Selection in Linear Regression for R
    Hofmann, Marc
    Gatu, Cristian
    Kontoghiorghes, Erricos J.
    Colubi, Ana
    Zeileis, Achim
    JOURNAL OF STATISTICAL SOFTWARE, 2020, 93 (03):