Multivariate Bernoulli distribution

被引:117
作者
Dai, Bin [1 ]
Ding, Shilin [2 ]
Wahba, Grace [3 ]
机构
[1] Tower Res Capital, New York, NY 10013 USA
[2] Facebook, Menlo Pk, CA 94025 USA
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
关键词
Bernoulli distribution; generalized linear models; LASSO; smoothing spline; LASSO-PATTERNSEARCH ALGORITHM; SMOOTHING SPLINE ANOVA; MODEL SELECTION; LIKELIHOOD-ESTIMATION; OPHTHALMOLOGY;
D O I
10.3150/12-BEJSP10
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider the multivariate Bernoulli distribution as a model to estimate the structure of graphs with binary nodes. This distribution is discussed in the framework of the exponential family, and its statistical properties regarding independence of the nodes are demonstrated. Importantly the model can estimate not only the main effects and pairwise interactions among the nodes but also is capable of modeling higher order interactions, allowing for the existence of complex clique effects. We compare the multivariate Bernoulli model with existing graphical inference models - the Ising model and the multivariate Gaussian model, where only the pairwise interactions are considered. On the other hand, the multivariate Bernoulli distribution has an interesting property in that independence and uncorrelatedness of the component random variables are equivalent. Both the marginal and conditional distributions of a subset of variables in the multivariate Bernoulli distribution still follow the multivariate Bernoulli distribution. Furthermore, the multivariate Bernoulli logistic model is developed under generalized linear model theory by utilizing the canonical link function in order to include covariate information on the nodes, edges and cliques. We also consider variable selection techniques such as LASSO in the logistic model to impose sparsity structure on the graph. Finally, we discuss extending the smoothing spline ANOVA approach to the multivariate Bernoulli logistic model to enable estimation of non-linear effects of the predictor variables.
引用
收藏
页码:1465 / 1483
页数:19
相关论文
共 24 条
[1]  
[Anonymous], 2011, P ADV NEUR INF PROC
[2]  
[Anonymous], 1990, CBMS NSF REGIONAL C, DOI DOI 10.1137/1.9781611970128
[3]  
[Anonymous], 1983, Generalized Linear Models
[4]  
Banerjee O, 2008, J MACH LEARN RES, V9, P485
[5]   SMOOTHING NOISY DATA WITH SPLINE FUNCTIONS [J].
WAHBA, G .
NUMERISCHE MATHEMATIK, 1975, 24 (05) :383-393
[6]  
Dai B., 2012, TECHNICAL REPORT
[7]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[8]   Smoothing spline ANOVA for multivariate Bernoulli observations, with application to ophthalmology data [J].
Gao, FY ;
Wahba, G ;
Klein, R ;
Klein, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (453) :127-147
[9]  
Gu C., 2002, SPR S STAT
[10]   Report on the theory of ferromagnetism [J].
Ising, E .
ZEITSCHRIFT FUR PHYSIK, 1925, 31 :253-258