The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference

被引:37
作者
Hernandez-Stumpfhauser, Daniel [1 ]
Breidt, F. Jay [2 ]
van der Woerd, Mark J. [3 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
[3] Colorado State Univ, Dept Biochem & Mol Biol, Ft Collins, CO 80523 USA
来源
BAYESIAN ANALYSIS | 2017年 / 12卷 / 01期
关键词
circular data; directional data; Gibbs sampler; Markov chain Monte Carlo; protein structure analysis; spherical data; REPRESENTATION; REGRESSION;
D O I
10.1214/15-BA989
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The general projected normal distribution is a simple and intuitive model for directional data in any dimension: a multivariate normal random vector divided by its length is the projection of that vector onto the surface of the unit hypersphere. Observed data consist of the projections, but not the lengths. Inference for this model has been restricted to the two-dimensional (circular) case, using Bayesian methods with data augmentation to generate the latent lengths and a Metropolis-within-Gibbs algorithm to sample from the posterior. We describe a new parameterization of the general projected normal distribution that makes inference in any dimension tractable, including the important three-dimensional (spherical) case, which has not previously been considered. Under this new parameterization, the full conditionals of the unknown parameters have closed forms, and we propose a new slice sampler to draw the latent lengths without the need for rejection. Gibbs sampling with this new scheme is fast and easy, leading to improved Bayesian inference; for example, it is now feasible to conduct model selection among complex mixture and regression models for large data sets. Our parameterization also allows straightforward incorporation of covariates into the covariance matrix of the multivariate normal, increasing the ability of the model to explain directional data as a function of independent regressors. Circular and spherical cases are considered in detail and illustrated with scientific applications. For the circular case, seasonal variation in time-of-day departures of anglers from recreational fishing sites is modeled using covariates in both the mean vector and covariance matrix. For the spherical case, we consider paired angles that describe the relative positions of carbon atoms along the backbone chain of a protein. We fit mixtures of general projected normals to these data, with the best-fitting mixture accurately describing biologically meaningful structures including helices, beta-sheets, and coils and turns. Finally, we show via simulation that our methodology has satisfactory performance in some 10-dimensional and 50-dimensional problems.
引用
收藏
页码:113 / 133
页数:21
相关论文
共 42 条
[1]  
[Anonymous], ANAL DIRECTIONAL TIM
[2]  
[Anonymous], CIRCULAR STAT R
[3]  
[Anonymous], THESIS
[4]  
[Anonymous], 2001, WWW, DOI 10.1145/371920.372071
[5]  
[Anonymous], STAT SPHERES
[6]  
[Anonymous], REV RECREATIONAL FIS
[7]  
[Anonymous], STAT DIRECTIONAL DAT
[8]  
[Anonymous], BAYESIAN ANAL
[9]  
[Anonymous], CIRCULAR STAT BIOL
[10]  
[Anonymous], TOPICS CIRCULAR STAT