Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression

被引:0
作者
Park, Soogeun [1 ]
Ceulemans, Eva [2 ]
Van Deun, Katrijn [1 ]
机构
[1] Tilburg Univ, Tilburg, Netherlands
[2] Katholieke Univ Leuven, Leuven, Belgium
关键词
Outcome variable selection; Response variable selection; Response selection; Variable selection; Principal covariates regression; Dimension reduction; MODEL SELECTION; MULTI-TRAIT; COMPONENTS; ALGORITHM; NUMBER; GWAS;
D O I
10.1007/s10994-024-06520-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Datasets comprised of large sets of both predictor and outcome variables are becoming more widely used in research. In addition to the well-known problems of model complexity and predictor variable selection, predictive modelling with such large data also presents a relatively novel and under-studied challenge of outcome variable selection. Certain outcome variables in the data may not be adequately predicted by the given sets of predictors. In this paper, we propose the method of Sparse Multivariate Principal Covariates Regression that addresses these issues altogether by expanding the Principal Covariates Regression model to incorporate sparsity penalties on both of predictor and outcome variables. Our method is one of the first methods that perform variable selection for both predictors and outcomes simultaneously. Moreover, by relying on summary variables that explain the variance in both predictor and outcome variables, the method offers a sparse and succinct model representation of the data. In a simulation study, the method performed better than methods with similar aims such as sparse Partial Least Squares at prediction of the outcome variables and recovery of the population parameters. Lastly, we administered the method on an empirical dataset to illustrate its application in practice.
引用
收藏
页码:7319 / 7370
页数:52
相关论文
共 50 条
  • [41] Bayesian Variable Selection for Multivariate Spatially Varying Coefficient Regression
    Reich, Brian J.
    Fuentes, Montserrat
    Herring, Amy H.
    Evenson, Kelly R.
    BIOMETRICS, 2010, 66 (03) : 772 - 782
  • [42] Variable selection in semiparametric hazard regression for multivariate survival data
    Liu, Jicai
    Zhang, Riquan
    Zhao, Weihua
    Lv, Yazhao
    JOURNAL OF MULTIVARIATE ANALYSIS, 2015, 142 : 26 - 40
  • [43] SPOT: Sparse Optimal Transformations for High Dimensional Variable Selection and Exploratory Regression Analysis
    Huang, Qiming
    Zhu, Michael
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 857 - 865
  • [44] Variable selection in regression models including functional data predictors
    Liu K.
    Wang S.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2019, 45 (10): : 1990 - 1994
  • [45] Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates
    Koslovsky, M. D.
    Swartz, M. D.
    Leon-Novelo, L.
    Chan, W.
    Wilkinson, A. V.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (03) : 575 - 596
  • [46] Fourier transform sparse inverse regression estimators for sufficient variable selection
    Weng, Jiaying
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 168
  • [47] Uniform convergence rates and automatic variable selection in nonparametric regression with functional and categorical covariates
    Selk, Leonie
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024, 36 (01) : 264 - 286
  • [48] Simultaneous selection of predictors and responses for high dimensional multivariate linear regression
    An, Baiguo
    Zhang, Beibei
    STATISTICS & PROBABILITY LETTERS, 2017, 127 : 173 - 177
  • [49] Statistical mechanical analysis of sparse linear regression as a variable selection problem
    Obuchi, Tomoyuki
    Nakanishi-Ohno, Yoshinori
    Okada, Masato
    Kabashima, Yoshiyuki
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2018,
  • [50] An approach of Bayesian variable selection for ultrahigh-dimensional multivariate regression
    Dai, Xiaotian
    Fu, Guifang
    Reese, Randall
    Zhao, Shaofei
    Shang, Zuofeng
    STAT, 2022, 11 (01):