Bayesian feature selection in high-dimensional regression in presence of correlated noise

被引:1
作者
Feldman, Guy [1 ]
Bhadra, Anindya [1 ]
Kirshner, Sergey [1 ]
机构
[1] Purdue Univ, Dept Stat, 250 N Univ St, W Lafayette, IN 47907 USA
来源
STAT | 2014年 / 3卷 / 01期
关键词
Bayesian methods; genomics; graphical models; high-dimensional data; variable selection;
D O I
10.1002/sta4.60
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of feature selection in a high-dimensional multiple predictors, multiple responses regression setting. Assuming that regression errors are i.i.d. when they are in fact dependent leads to inconsistent and inefficient feature estimates. We relax the i.i.d. assumption by allowing the errors to exhibit a tree-structured dependence. This allows a Bayesian problem formulation with the error dependence structure treated as an auxiliary variable that can be integrated out analytically with the help of the matrix-tree theorem. Mixing over trees results in a flexible technique for modelling the graphical structure for the regression errors. Furthermore, the analytic integration results in a collapsed Gibbs sampler for feature selection that is computationally efficient. Our approach offers significant performance gains over the competing methods in simulations, especially when the features themselves are correlated. In addition to comprehensive simulation studies, we apply our method to a high-dimensional breast cancer data set to identify markers significantly associated with the disease. Copyright (C) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:258 / 272
页数:15
相关论文
共 50 条
  • [31] Maximal cliques-based hybrid high-dimensional feature selection with interaction screening for regression
    Chamlal, Hasna
    Benzmane, Asmaa
    Ouaderhman, Tayeb
    NEUROCOMPUTING, 2024, 607
  • [32] A systematic review on model selection in high-dimensional regression
    Lee, Eun Ryung
    Cho, Jinwoo
    Yu, Kyusang
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (01) : 1 - 12
  • [33] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175
  • [34] Cluster feature selection in high-dimensional linear models
    Lin, Bingqing
    Pang, Zhen
    Wang, Qihua
    RANDOM MATRICES-THEORY AND APPLICATIONS, 2018, 7 (01)
  • [35] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145
  • [36] Improved PSO for Feature Selection on High-Dimensional Datasets
    Tran, Binh
    Xue, Bing
    Zhang, Mengjie
    SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 503 - 515
  • [37] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [38] High-Dimensional Software Engineering Data and Feature Selection
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 83 - +
  • [39] Bayesian Function-on-Scalars Regression for High-Dimensional Data
    Kowal, Daniel R.
    Bourgeois, Daniel C.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (03) : 629 - 638
  • [40] Evaluating Feature Selection Robustness on High-Dimensional Data
    Pes, Barbara
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 235 - 247