A nonparametric Bayesian technique for high-dimensional regression

被引:5
作者
Guha, Subharup [1 ]
Baladandayuthapani, Veerabhadran [2 ]
机构
[1] Dept Stat, 307D Middlebush Hall, Columbia, MO 65211 USA
[2] Dept Biostat, 1400 Pressler St, Houston, TX 77030 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Dirichlet process; local clustering; model-based clustering; nonparametric Bayes; Poisson-Dirichlet process; VARIABLE SELECTION; SCALE MIXTURES; GENE; MODEL; SURVIVAL; AMPLIFICATION; CHEMOTHERAPY; RESISTANCE; INFERENCE; BINARY;
D O I
10.1214/16-EJS1184
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings. Poisson-Dirichlet processes are utilized to detect lower-dimensional latent clusters of covariates. An adaptive nonlinear prediction model is constructed for the response, achieving a balance between model parsimony and flexibility. Contrary to conventional belief, cluster detection is shown to be a posteriori consistent for a general class of models as the number of covariates and subjects grows. Simulation studies and data analyses demonstrate that VariScan often outperforms several well-known statistical methods.
引用
收藏
页码:3374 / 3424
页数:51
相关论文
共 86 条
[1]   BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA [J].
ALBERT, JH ;
CHIB, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) :669-679
[2]  
ANDREWS DF, 1974, J ROY STAT SOC B MET, V36, P99
[3]  
[Anonymous], TECHNICAL REPORT
[4]  
[Anonymous], 2012, ARXIV E PRINTS
[5]  
[Anonymous], CHAPMAN HALL CRC BIO
[6]  
[Anonymous], 2006, Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model
[7]  
[Anonymous], 1978, A Practical Guide to Splines
[8]  
[Anonymous], 2015, Bayesian Analysis
[9]  
[Anonymous], BAYESIAN METHODS NON
[10]  
[Anonymous], J AM STAT ASS