Variable selection with LASSO regression for complex survey data

被引:6
|
作者
Iparragirre, Amaia [1 ,4 ]
Lumley, Thomas [2 ]
Barrio, Irantzu [1 ,3 ]
Arostegui, Inmaculada [1 ,3 ]
机构
[1] Univ Basque Country UPV EHU, Dept Math, Leioa 48940, Spain
[2] Univ Auckland, Dept Stat, Auckland 1142, New Zealand
[3] BCAM Basque Ctr Appl Math, Bilbao 48009, Spain
[4] Fac Sci & Technol, Dept Math, Barrio Sarriena s-n, Bizkaia 48940, Basque Country, Spain
来源
STAT | 2023年 / 12卷 / 01期
关键词
complex survey data; cross-validation; LASSO regression; replicate weights; variable selection; MODELS; INFERENCE;
D O I
10.1002/sta4.578
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection is an important step to end up with good prediction models. LASSO regression models are one of the most commonly used methods for this purpose, for which cross-validation is the most widely applied validation technique to choose the tuning parameter (lambda). Validation techniques in a complex survey framework are closely related to "replicate weights". However, to our knowledge, they have never been used in a LASSO regression context. Applying LASSO regression models to complex survey data could be challenging. The goal of this paper is twofold. On the one hand, we analyze the performance of replicate weights methods to select the tuning parameter for fitting LASSO regression models to complex survey data. On the other hand, we propose new replicate weights methods for the same purpose. In particular, we propose a new design-based cross-validation method as a combination of the traditional cross-validation and replicate weights. The performance of all these methods has been analyzed and compared by means of an extensive simulation study to the traditional cross-validation technique to select the tuning parameter for LASSO regression models. The results suggest a considerable improvement when the new proposal design-based cross-validation is used instead of the traditional cross-validation.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Model uncertainty and variable selection in Bayesian lasso regression
    Chris Hans
    Statistics and Computing, 2010, 20 : 221 - 229
  • [2] Model uncertainty and variable selection in Bayesian lasso regression
    Hans, Chris
    STATISTICS AND COMPUTING, 2010, 20 (02) : 221 - 229
  • [3] Scenario selection with LASSO regression for the valuation of variable annuity portfolios
    Nguyen, Hang
    Sherris, Michael
    Villegas, Andres M.
    Ziveyi, Jonathan
    INSURANCE MATHEMATICS & ECONOMICS, 2024, 116 : 27 - 43
  • [4] Variable selection in composite quantile regression models with adaptive group lasso
    Zhou, Xiaoshuang
    International Journal of Applied Mathematics and Statistics, 2013, 45 (15): : 12 - 19
  • [5] Regularization and variable selection with triple shrinkage in linear regression: a generalization of lasso
    Genc, Murat
    Ozkale, M. Revan
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (11) : 5242 - 5264
  • [6] An Efficient Method for Variable Selection Based on Diagnostic-Lasso Regression
    Alshqaq, Shokrya Saleh
    Abuzaid, Ali H.
    SYMMETRY-BASEL, 2023, 15 (12):
  • [7] LAD-Lasso variable selection for doubly censored median regression models
    Zhou, Xiuqing
    Liu, Guoxiang
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (12) : 3658 - 3667
  • [8] LASSO variable selection in data envelopment analysis with small datasets
    Lee, Chia-Yen
    Cai, Jia-Ying
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2020, 91
  • [9] Lasso in infinite dimension: application to variable selection in functional multivariate linear regression
    Roche, Angelina
    ELECTRONIC JOURNAL OF STATISTICS, 2023, 17 (02): : 3357 - 3405
  • [10] Variable selection with group LASSO approach: Application to Cox regression with frailty model
    Utazirubanda, Jean Claude
    M. Leon, Tomas
    Ngom, Papa
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (03) : 881 - 901