A general algorithm for covariance modeling of discrete data

被引:16
作者
Popovic, Gordana C. [1 ]
Hui, Francis K. C. [2 ]
Warton, David I. [1 ,3 ]
机构
[1] Univ New South Wales, Sch Math & Stat, Sydney, NSW 2052, Australia
[2] Australian Natl Univ, Math Sci Inst, Acton, ACT 2601, Australia
[3] Univ New South Wales, Evolut & Ecol Res Ctr, Sydney, NSW 2052, Australia
基金
澳大利亚研究理事会;
关键词
Factor analysis; Gaussian copula; Graphical model; Overdispersed count data; Species interaction; GRAPHICAL MODEL; SELECTION;
D O I
10.1016/j.jmva.2017.12.002
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose an algorithm that generalizes to discrete data any given covariance modeling algorithm originally intended for Gaussian responses, via a Gaussian copula approach. Covariance modeling is a powerful tool for extracting meaning from multivariate data, and fast algorithms for Gaussian data, such as factor analysis and Gaussian graphical models, are widely available. Our algorithm makes these tools generally available to analysts of discrete data and can combine any likelihood-based covariance modeling method for Gaussian data with any set of discrete marginal distributions. Previously, tools for discrete data were generally specific to one family of distributions or covariance modeling paradigm, or otherwise did not exist. Our algorithm is more flexible than alternate methods, takes advantage of existing fast algorithms for Gaussian data, and simulations suggest that it outperforms competing graphical modeling and factor analysis procedures for count and binomial data. We additionally show that in a Gaussian copula graphical model with discrete margins, conditional independence relationships in the latent Gaussian variables are inherited by the discrete observations. Our method is illustrated with a graphical model and factor analysis on an overdispersed ecological count dataset of species abundances. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:86 / 100
页数:15
相关论文
共 43 条
[1]   A Local Poisson Graphical Model for Inferring Networks From Sequencing Data [J].
Allen, Genevera I. ;
Liu, Zhandong .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2013, 12 (03) :189-198
[2]  
[Anonymous], 1984, An Introduction to Latent Variable Models
[3]  
[Anonymous], 1967, Mathematical Statistics: A Decision Theoretic Approach
[4]  
Banerjee O., 2006, P 23 INT C MACHINE L, P89, DOI DOI 10.1145/1143844.1143856
[5]   Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm [J].
Booth, JG ;
Hobert, JP .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1999, 61 :265-285
[6]   High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics [J].
Carvalho, Carlos M. ;
Chang, Jeffrey ;
Lucas, Joseph E. ;
Nevins, Joseph R. ;
Wang, Quanli ;
West, Mike .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (484) :1438-1456
[7]  
Casella G., 2002, STAT INFERENCE, V2
[8]  
Dauwels J, 2013, INT CONF ACOUST SPEE, P6283, DOI 10.1109/ICASSP.2013.6638874
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Dunn P., 1996, J COMPUT GRAPH STAT, V5, P236, DOI DOI 10.2307/1390802