Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

被引:0
作者
Sarkar R. [1 ]
Manage S. [2 ]
Gao X. [3 ]
机构
[1] Department of Mathematics and Statistics, University of North Carolina at Greensboro, PO Box 26170, 116 Petty Building, Greensboro, 27402, NC
[2] Department of Mathematics, Texas A&M University, Blocker Building, 3368 TAMU, 155 Ireland Street, College Station, 77840, TX
[3] Meta Platforms, Menlo Park, CA
基金
美国国家科学基金会;
关键词
Bi-level sparsity; Minimax concave penalty; Stability; Strong correlation; Variable selection;
D O I
10.1007/s40745-023-00481-5
中图分类号
学科分类号
摘要
High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.
引用
收藏
页码:1139 / 1164
页数:25
相关论文
共 50 条
  • [41] Nonnegative estimation and variable selection via adaptive elastic-net for high-dimensional data
    Li, Ning
    Yang, Hu
    Yang, Jing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (12) : 4263 - 4279
  • [42] Stability of feature selection in classification issues for high-dimensional correlated data
    Émeline Perthame
    Chloé Friguet
    David Causeur
    Statistics and Computing, 2016, 26 : 783 - 796
  • [43] High-dimensional genomic feature selection with the ordered stereotype logit model
    Seffernick, Anna Eames
    Mrozek, Krzysztof
    Nicolet, Deedra
    Stone, Richard M.
    Eisfeld, Ann-Kathrin
    Byrd, John C.
    Archer, Kellie J.
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (06)
  • [44] SHARP VARIABLE SELECTION OF A SPARSE SUBMATRIX IN A HIGH-DIMENSIONAL NOISY MATRIX
    Butucea, Cristina
    Ingster, Yuri I.
    Suslina, Irina A.
    ESAIM-PROBABILITY AND STATISTICS, 2015, 19 : 115 - 134
  • [45] A Metropolized Adaptive Subspace Algorithm for High-Dimensional Bayesian Variable Selection
    Staerk, Christian
    Kateri, Maria
    Ntzoufras, Ioannis
    BAYESIAN ANALYSIS, 2024, 19 (01): : 261 - 291
  • [46] High-dimensional local polynomial regression with variable selection and dimension reduction
    Cheung, Kin Yap
    Lee, Stephen M. S.
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [47] Variable selection in high-dimensional sparse multiresponse linear regression models
    Luo, Shan
    STATISTICAL PAPERS, 2020, 61 (03) : 1245 - 1267
  • [48] UPS DELIVERS OPTIMAL PHASE DIAGRAM IN HIGH-DIMENSIONAL VARIABLE SELECTION
    Ji, Pengsheng
    Jin, Jiashun
    ANNALS OF STATISTICS, 2012, 40 (01) : 73 - 103
  • [49] High-dimensional local polynomial regression with variable selection and dimension reduction
    Kin Yap Cheung
    Stephen M. S. Lee
    Statistics and Computing, 2024, 34
  • [50] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    BIOMETRICS, 2010, 66 (03) : 793 - 804