Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

被引:0
作者
Sarkar R. [1 ]
Manage S. [2 ]
Gao X. [3 ]
机构
[1] Department of Mathematics and Statistics, University of North Carolina at Greensboro, PO Box 26170, 116 Petty Building, Greensboro, 27402, NC
[2] Department of Mathematics, Texas A&M University, Blocker Building, 3368 TAMU, 155 Ireland Street, College Station, 77840, TX
[3] Meta Platforms, Menlo Park, CA
基金
美国国家科学基金会;
关键词
Bi-level sparsity; Minimax concave penalty; Stability; Strong correlation; Variable selection;
D O I
10.1007/s40745-023-00481-5
中图分类号
学科分类号
摘要
High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.
引用
收藏
页码:1139 / 1164
页数:25
相关论文
共 50 条
  • [31] Variable selection in high-dimensional double generalized linear models
    Dengke Xu
    Zhongzhan Zhang
    Liucang Wu
    Statistical Papers, 2014, 55 : 327 - 347
  • [32] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [33] High-Dimensional Regression and Variable Selection Using CAR Scores
    Zuber, Verena
    Strimmer, Korbinian
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [34] A Simple Information Criterion for Variable Selection in High-Dimensional Regression
    Pluntz, Matthieu
    Dalmasso, Cyril
    Tubert-Bitter, Pascale
    Ahmed, Ismail
    STATISTICS IN MEDICINE, 2025, 44 (1-2)
  • [35] Model-based clustering of high-dimensional data: Variable selection versus facet determination
    Poon, Leonard K. M.
    Zhang, Nevin L.
    Liu, Tengfei
    Liu, April H.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (01) : 196 - 215
  • [36] Variable Selection via SCAD-Penalized Quantile Regression for High-Dimensional Count Data
    Khan, Dost Muhammad
    Yaqoob, Anum
    Iqbal, Nadeem
    Wahid, Abdul
    Khalil, Umair
    Khan, Mukhtaj
    Abd Rahman, Mohd Amiruddin
    Mustafa, Mohd Shafie
    Khan, Zardad
    IEEE ACCESS, 2019, 7 : 153205 - 153216
  • [37] Variable selection for model-based high-dimensional clustering and its application to microarray data
    Wang, Sijian
    Zhu, Ji
    BIOMETRICS, 2008, 64 (02) : 440 - 448
  • [38] High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
    Yu, Tingting
    Ye, Shangyuan
    Wang, Rui
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 900 - 923
  • [39] Stability of feature selection in classification issues for high-dimensional correlated data
    Perthame, Emeline
    Friguet, Chloe
    Causeur, David
    STATISTICS AND COMPUTING, 2016, 26 (04) : 783 - 796
  • [40] A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
    Bommert, Andrea
    Rahnenfuehrer, Joerg
    Lang, Michel
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017