Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression

被引:0
|
作者
Park, Soogeun [1 ]
Ceulemans, Eva [2 ]
Van Deun, Katrijn [1 ]
机构
[1] Tilburg Univ, Tilburg, Netherlands
[2] Katholieke Univ Leuven, Leuven, Belgium
关键词
Outcome variable selection; Response variable selection; Response selection; Variable selection; Principal covariates regression; Dimension reduction; MODEL SELECTION; MULTI-TRAIT; COMPONENTS; ALGORITHM; NUMBER; GWAS;
D O I
10.1007/s10994-024-06520-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Datasets comprised of large sets of both predictor and outcome variables are becoming more widely used in research. In addition to the well-known problems of model complexity and predictor variable selection, predictive modelling with such large data also presents a relatively novel and under-studied challenge of outcome variable selection. Certain outcome variables in the data may not be adequately predicted by the given sets of predictors. In this paper, we propose the method of Sparse Multivariate Principal Covariates Regression that addresses these issues altogether by expanding the Principal Covariates Regression model to incorporate sparsity penalties on both of predictor and outcome variables. Our method is one of the first methods that perform variable selection for both predictors and outcomes simultaneously. Moreover, by relying on summary variables that explain the variance in both predictor and outcome variables, the method offers a sparse and succinct model representation of the data. In a simulation study, the method performed better than methods with similar aims such as sparse Partial Least Squares at prediction of the outcome variables and recovery of the population parameters. Lastly, we administered the method on an empirical dataset to illustrate its application in practice.
引用
收藏
页码:7319 / 7370
页数:52
相关论文
共 50 条
  • [31] RESPONSE VARIABLE SELECTION IN MULTIVARIATE LINEAR REGRESSION
    Khare, Kshitij
    Su, Zhihua
    STATISTICA SINICA, 2024, 34 (03) : 1325 - 1345
  • [32] Multivariate Regression: The Pitfalls of Automated Variable Selection
    Sainani, Kristin L.
    PM&R, 2013, 5 (09) : 791 - 794
  • [33] Obtaining insights from high-dimensional data: sparse principal covariates regression
    Katrijn Van Deun
    Elise A. V. Crompvoets
    Eva Ceulemans
    BMC Bioinformatics, 19
  • [34] Obtaining insights from high-dimensional data: sparse principal covariates regression
    Van Deun, Katrijn
    Crompvoets, Elise A. V.
    Ceulemans, Eva
    BMC BIOINFORMATICS, 2018, 19
  • [35] Variable Selection in Linear Regression With Many Predictors
    Cai, Airong
    Tsay, Ruey S.
    Chen, Rong
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (03) : 573 - 591
  • [36] Instrumental variable based SEE variable selection for Poisson regression models with endogenous covariates
    Huang, Jiting
    Zhao, Peixin
    Huang, Xingshou
    JOURNAL OF APPLIED MATHEMATICS AND COMPUTING, 2019, 59 (1-2) : 163 - 178
  • [37] Quantile function regression and variable selection for sparse models
    Yoshida, Takuma
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (04): : 1196 - 1221
  • [38] Variable Selection for Sparse Logistic Regression with Grouped Variables
    Zhong, Mingrui
    Yin, Zanhua
    Wang, Zhichao
    MATHEMATICS, 2023, 11 (24)
  • [40] Exhaustive Search for Sparse Variable Selection in Linear Regression
    Igarashi, Yasuhiko
    Takenaka, Hikaru
    Nakanishi-Ohno, Yoshinori
    Uemura, Makoto
    Ikeda, Shiro
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2018, 87 (04)