Group subset selection for linear regression

被引:12
|
作者
Guo, Yi [1 ]
Berman, Mark [1 ]
Gao, Junbin [2 ]
机构
[1] CSIRO Computat Informat, N Ryde, NSW 1670, Australia
[2] Charles Sturt Univ, Sch Comp & Math, Bathurst, NSW 2795, Australia
关键词
Subset selection; Group Lasso; Linear regression; Screening; VARIABLE SELECTION; ALGORITHMS; REGULARIZATION; SHRINKAGE; LASSO; MODEL;
D O I
10.1016/j.csda.2014.02.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Two fast group subset selection (GSS) algorithms for the linear regression model are proposed in this paper. GSS finds the best combinations of groups up to a specified size minimising the residual sum of squares. This imposes an 10 constraint on the regression coefficients in a group context. It is a combinatorial optimisation problem with NP complexity. To make the exhaustive search very efficient, the GSS algorithms are built on QR decomposition and branch-and-bound techniques. They are suitable for middle scale problems where finding the most accurate solution is essential. In the application motivating this research, it is natural to require that the coefficients of some of the variables within groups satisfy some constraints (e.g. non-negativity). Therefore the GSS algorithms (optionally) calculate the model coefficient estimates during the exhaustive search in order to screen combinations that do not meet the constraints. The faster of the two GSS algorithms is compared to an extension to the original group Lasso, called the constrained group Lasso (CGL), which is proposed to handle convex constraints and to remove orthogonality requirements on the variables within each group. CGL is a convex relaxation of the GSS problem and hence more straightforward to solve. Although CGL is inferior to GSS in terms of group selection accuracy, it is a fast approximation to GSS if the optimal regularisation parameter can be determined efficiently and, in some cases, it may serve as a screening procedure to reduce the number of groups. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:39 / 52
页数:14
相关论文
共 50 条
  • [21] Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms
    Orkcu, H. Hasan
    APPLIED MATHEMATICS AND COMPUTATION, 2013, 219 (23) : 11018 - 11028
  • [22] Variable selection in linear regression models: Choosing the best subset is not always the best choice
    Hanke, Moritz
    Dijkstra, Louis
    Foraita, Ronja
    Didelez, Vanessa
    BIOMETRICAL JOURNAL, 2024, 66 (01)
  • [23] Common subset selection of inputs in multiresponse regression
    Simila, Timo
    Tikka, Jarkko
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1908 - +
  • [24] ON SUBSET-SELECTION IN NONPARAMETRIC STOCHASTIC REGRESSION
    YAO, QW
    TONG, H
    STATISTICA SINICA, 1994, 4 (01) : 51 - 70
  • [25] SUBSET-SELECTION IN REGRESSION - MILLER,AJ
    MAYEKAWA, S
    JOURNAL OF EDUCATIONAL STATISTICS, 1992, 17 (04): : 375 - 377
  • [26] Training Subset Selection for Support Vector Regression
    Liu, Cenru
    Cen, Jiahao
    PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 11 - 14
  • [27] SUBSET-SELECTION IN REGRESSION - MILLER,AJ
    HALDAR, S
    JOURNAL OF MARKETING RESEARCH, 1992, 29 (02) : 270 - 272
  • [28] Subset selection for linear mixed models
    Kowal, Daniel R.
    BIOMETRICS, 2023, 79 (03) : 1853 - 1867
  • [29] Conditional Uncorrelation and Efficient Subset Selection in Sparse Regression
    Wang, Jianji
    Zhang, Shupei
    Liu, Qi
    Du, Shaoyi
    Guo, Yu-Cheng
    Zheng, Nanning
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10458 - 10467
  • [30] ON THE OPTIMALITY OF BACKWARD REGRESSION: SPARSE RECOVERY AND SUBSET SELECTION
    Ament, Sebastian
    Gomes, Carla
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5599 - 5603